A Hybrid and Scalable Sentiment Analysis Framework: Case of Russo-Ukrainian War

Allayla M. A., AYVAZ S.

3rd International Scientific Conference of Engineering Sciences, ISCES 2023, Diyala, Iraq, 3 - 04 May 2023, pp.13-18 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/isces58193.2023.10311495
  • City: Diyala
  • Country: Iraq
  • Page Numbers: pp.13-18
  • Keywords: Apache Spark, Big Data, Lexicon Ensemble, Sentiment Analysis, Twitter
  • Yıldız Technical University Affiliated: Yes


Research on the sentiment analysis of large datasets using traditional engines has become challenging and may result in computational bottlenecks. Therefore, a distributed big-data processing environment should be considered to manage these large datasets. Owing to the ambiguity and subjectivity of human labelling in the context of text analysis, the possibility of replacing it with automated labelling is becoming necessary to make the process easier and more effective. Another important issue is that some lexicon methods are ineffective in classifying opinions based on texts not included within the lexicon corpus's terms. Hence, we proposed a hybrid framework that integrates lexicon ensemble and Apache Spark ML pipeline with multiple feature combinations scenarios to improve performance. As a case study to test its effectiveness, we explored the sentiments towards the outbreak of the Russo-Ukrainian war when people around the world started flooding the Twitter social media platform with their tweets. Analyzing these tweets can provide valuable insight into people's opinions and international community reactions to the ongoing crisis. Various comparative experiments were conducted on the combined features using four Spark ML classifiers: Naive Bayes (NB), Logistic Regression (LR), Support Vector Classifier (SVC), and Multilayer perceptron (MLP). The various hybrid method experiment results showed that the Multilayer Perceptron (MLP) classifier had achieved a maximum accuracy of 89.49% compared with the other models. The lexicon ensemble of Max voting results indicates that people generally hold more negative sentiments than positive or neutral.