Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

Guran, Aysun; Güler Bayazıt, Nilgün; Gurbuz, Mustafa

doi:10.3906/elk-1201-15

Efficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

Guran A., Güler Bayazıt N., Gurbuz M. Z.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.21, ss.1411-1425, 2013 (SCI-Expanded, Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 21
Basım Tarihi: 2013
Doi Numarası: 10.3906/elk-1201-15
Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.1411-1425
Anahtar Kelimeler: Turkish text summarization, latent semantic analysis, analytical hierarchical process, artificial bee colony algorithm, Turkish Wikipedia
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This study presents a novel hybrid Turkish text summarization system that combines structural and semantic features. The system uses 5 structural features, 1 of which is newly proposed and 3 are semantic features whose values are extracted from Turkish Wikipedia links. The features are combined using the weights calculated by 2 novel approaches. The first approach makes use of an analytical hierarchical process, which depends on a series of expert judgments based on pairwise comparisons of the features. The second approach makes use of the artificial bee colony algorithm for automatically determining the weights of the features. To confirm the significance of the proposed hybrid system, its performance is evaluated on a new Turkish corpus that contains 110 documents and 3 human-generated extractive summary corpora. The experimental results show that exploiting all of the features by combining them results in a better performance than exploiting each feature individually.