A Hybrid and Scalable Sentiment Analysis Framework: Case of Russo-Ukrainian War


Allayla M. A., AYVAZ S.

3rd International Scientific Conference of Engineering Sciences, ISCES 2023, Diyala, Irak, 3 - 04 Mayıs 2023, ss.13-18 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/isces58193.2023.10311495
  • Basıldığı Şehir: Diyala
  • Basıldığı Ülke: Irak
  • Sayfa Sayıları: ss.13-18
  • Anahtar Kelimeler: Apache Spark, Big Data, Lexicon Ensemble, Sentiment Analysis, Twitter
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Research on the sentiment analysis of large datasets using traditional engines has become challenging and may result in computational bottlenecks. Therefore, a distributed big-data processing environment should be considered to manage these large datasets. Owing to the ambiguity and subjectivity of human labelling in the context of text analysis, the possibility of replacing it with automated labelling is becoming necessary to make the process easier and more effective. Another important issue is that some lexicon methods are ineffective in classifying opinions based on texts not included within the lexicon corpus's terms. Hence, we proposed a hybrid framework that integrates lexicon ensemble and Apache Spark ML pipeline with multiple feature combinations scenarios to improve performance. As a case study to test its effectiveness, we explored the sentiments towards the outbreak of the Russo-Ukrainian war when people around the world started flooding the Twitter social media platform with their tweets. Analyzing these tweets can provide valuable insight into people's opinions and international community reactions to the ongoing crisis. Various comparative experiments were conducted on the combined features using four Spark ML classifiers: Naive Bayes (NB), Logistic Regression (LR), Support Vector Classifier (SVC), and Multilayer perceptron (MLP). The various hybrid method experiment results showed that the Multilayer Perceptron (MLP) classifier had achieved a maximum accuracy of 89.49% compared with the other models. The lexicon ensemble of Max voting results indicates that people generally hold more negative sentiments than positive or neutral.