Automatic Turkish text categorization in terms of author, genre and gender


Amasyalı M. F., Diri B.

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, cilt.3999, ss.221-226, 2006 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 3999
  • Basım Tarihi: 2006
  • Dergi Adı: NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, MathSciNet, Philosopher's Index, zbMATH
  • Sayfa Sayıları: ss.221-226
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document's author, classifying documents according to text's genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.