TEXT CLASSIFICATION ON TURKISH NEWS USING DEEP LEARNING


Creative Commons License

Doğan S., Çene E.

7th International Congress on Fundamental and Applied Sciences 2020 (ICFAS2020) , Priştine, Kosovo, 06 October 2020, pp.57

  • Publication Type: Conference Paper / Summary Text
  • City: Priştine
  • Country: Kosovo
  • Page Numbers: pp.57

Abstract

In this study, the performance of text classification is measured on a Turkish news corpus with 9 categories using Artificial Neural Networks and Recurrent Neural Networks with 3 proposed models. The corpus consists of 32538 news with the categories sports, world news, Turkey, economics, magazine, politics, health, art & culture and technology. Models are trained after the news texts were shortened using extractive text summarization method. Text summarization not only helped to achieve a more stable distribution for news length and reduced the bias, but also give a huge improve in model training times. Artificial Neural Networks, Bidirectional Long Short Term-Memory Units (LSTM) and Bidirectional Gated Recurrent Unit (GRU) based on Attention Mechanism are the 3 models suggested in this work. Each proposed model is retrained using pre-trained FastText word vectors. Moreover, This is the first study in Turkish that uses recurrent neural networks with the attention mechanism. As a result of the study, F1 value of the proposed models range between 89.55% to 91.32 where the LSTM based model is found as the most successful model.