Türkçe Haberlerin Tür Tespiti İçin Konu Modelleme Yöntemlerinin Karşılaştırılması


Creative Commons License

Güven Z. A., Diri B., Cakaloglu T.

4th International Conference on Computer Science and Engineering (UBMK), Samsun, Türkiye, 11 - 15 Eylül 2019 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2019.8907050
  • Basıldığı Şehir: Samsun
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Topic Modelling, Latent Dirichlet Allocation, Natural Language Processing, New Analysis, Non-Negative Matrix Factorization, Latent Semantic Analysis
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Today, with the increase of Internet-based documents, we are presented with many data that need to be processed and evaluated. Media, news and advertising are some of the areas where these data arc evaluated. For the news, the classification of people in the media sector is an important problem in terms of time. In this paper, it is aimed to determine which types of news titles belong to. The dataset consists of 4200 Turkish new titles belonging to 7 class labels. In order to determine the types, classical Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Non-Negative Matrix Factorization (NMF) algorithms were used in topic modeling. In addition, the LDA-based n-LDA method was also used. The accuracy of all methods used was measured and compared. NMF was the most successful method for three classes, while for five and seven classes LSA was the most successful method.