Automatic Turkish text categorization in terms of author, genre and gender


AMASYALI M. F. , DİRİ B.

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, cilt.3999, ss.221-226, 2006

  • Cilt numarası: 3999
  • Basım Tarihi: 2006
  • Dergi Adı: NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS
  • Sayfa Sayısı: ss.221-226

Özet

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document's author, classifying documents according to text's genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.