IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

GÜVEN, ZEKERİYA; DİRİ, Banu; ÇAKALOĞLU, TOLGAHAN

doi:10.7494/csci.2022.23.3.4622

IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

Atıf İçin Kopyala

GÜVEN Z. A., DİRİ B., ÇAKALOĞLU T.

Computer Science, cilt.23, sa.3, ss.377-396, 2022 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 23 Sayı: 3
Basım Tarihi: 2022
Doi Numarası: 10.7494/csci.2022.23.3.4622
Dergi Adı: Computer Science
Derginin Tarandığı İndeksler: Scopus
Sayfa Sayıları: ss.377-396
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention requires advanced methodologies. There are topic modeling methods to overcome this problem in text analysis for downstream tasks, such as sentiment analysis, spam detection, and news classification. In this research, we benchmark several classifiers, namely Random Forest, AdaBoost, Naive Bayes, and Logistic Regression, using the classical LDA and n-stage LDA topic modeling methods for feature extraction in headlines classification. We run our experiments on 3 and 5 classes publicly available Turkish and English datasets. We demonstrate that n-stage LDA as a feature extractor obtains state-of-the-art performance for any downstream classifier. It should also be noted that Random Forest was the most successful algorithm for both datasets.