Türkçe Haberlerin Tür Tespiti İçin Konu Modelleme Yöntemlerinin Karşılaştırılması

Güven Z. A., Diri B., Cakaloglu T.

4th International Conference on Computer Science and Engineering (UBMK), Samsun, Türkiye, 11 - 15 Eylül 2019, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/ubmk.2019.8907050
Basıldığı Şehir: Samsun
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: Topic Modelling, Latent Dirichlet Allocation, Natural Language Processing, New Analysis, Non-Negative Matrix Factorization, Latent Semantic Analysis
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Today, with the increase of Internet-based documents, we are presented with many data that need to be processed and evaluated. Media, news and advertising are some of the areas where these data arc evaluated. For the news, the classification of people in the media sector is an important problem in terms of time. In this paper, it is aimed to determine which types of news titles belong to. The dataset consists of 4200 Turkish new titles belonging to 7 class labels. In order to determine the types, classical Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Non-Negative Matrix Factorization (NMF) algorithms were used in topic modeling. In addition, the LDA-based n-LDA method was also used. The accuracy of all methods used was measured and compared. NMF was the most successful method for three classes, while for five and seven classes LSA was the most successful method.