Automatic Turkish text categorization in terms of author, genre and gender


Amasyalı M. F., Diri B.

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, vol.3999, pp.221-226, 2006 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 3999
  • Publication Date: 2006
  • Journal Name: NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, MathSciNet, Philosopher's Index, zbMATH
  • Page Numbers: pp.221-226
  • Yıldız Technical University Affiliated: Yes

Abstract

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document's author, classifying documents according to text's genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.