Automatic Turkish text categorization in terms of author, genre and gender

Amasyalı, Mehmet; Diri, Banu

Automatic Turkish text categorization in terms of author, genre and gender

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, cilt.3999, ss.221-226, 2006 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 3999
Basım Tarihi: 2006
Dergi Adı: NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, MathSciNet, Philosopher's Index, zbMATH
Sayfa Sayıları: ss.221-226
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document's author, classifying documents according to text's genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.