Author Attribution of Turkish Texts by Feature Mining


Türkoğlu F., DİRİ B., AMASYALI M. F.

Third International Conference on Intelligent Computing, ICIC 2007, Quingdao, China, 21 August 2007, vol.1, pp.1086-1093

  • Publication Type: Conference Paper / Full Text
  • Volume: 1
  • City: Quingdao
  • Country: China
  • Page Numbers: pp.1086-1093
  • Yıldız Technical University Affiliated: Yes

Abstract

Abstract

The aim of this study is to identify the author of an unauthorized document. Ten different feature vectors are obtained from authorship attributes, n-grams and various combinations of these feature vectors that are extracted from documents, which the authors are intended to be identified. Comparative performance of every feature vector is analyzed by applying Naïve Bayes, SVM, k-NN, RF and MLP classification methods. The most successful classifiers are MLP and SVM. In document classification process, it is observed that n-grams give higher accuracy rates than authorship attributes. Nevertheless, using n-gram and authorship attributes together, gives better results than when each is used alone.

Keywords

Author attribution n-grams Text classification Feature extraction Turkish documents