An Integrated Approach to Automatic Synonym Detection in Turkish Corpus


Yildiz T., Yildirum S., DİRİ B.

9th International Conference on Natural Language Processing (NLP), Warszawa, Polonya, 17 - 19 Eylül 2014, cilt.8686, ss.116-119 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 8686
  • Basıldığı Şehir: Warszawa
  • Basıldığı Ülke: Polonya
  • Sayfa Sayıları: ss.116-119
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, we designed a model to determine synonymy. Our main assumption is that synonym pairs show similar semantic and dependency relation by the definition. They share same meronym/holonym and hypernym/hyponym relations. Contrary to synonymy, hypernymy and meronymy relations can probably be acquired by applying lexico-syntactic patterns to a big corpus. Such acquisition might be utilized and ease detection of synonymy. Likewise, we utilized some particular dependency relations such as object/subject of a verb, etc. Machine learning algorithms were applied on all these acquired features. The first aim is to find out which dependency and semantic features are the most informative and contribute most to the model. Performance of each feature is individually evaluated with cross validation. The model that combines all features shows promising results and successfully detects synonymy relation. The main contribution of the study is to integrate both semantic and dependency relation within distributional aspect. Second contribution is considered as being first major attempt for Turkish synonym identification based on corpus-driven approach.

Abstract

In this study, we designed a model to determine synonymy. Our main assumption is that synonym pairs show similar semantic and dependency relation by the definition. They share same meronym/holonym and hypernym/hyponym relations. Contrary to synonymy, hypernymy and meronymy relations can probably be acquired by applying lexico-syntactic patterns to a big corpus. Such acquisition might be utilized and ease detection of synonymy. Likewise, we utilized some particular dependency relations such as object/subject of a verb, etc. Machine learning algorithms were applied on all these acquired features. The first aim is to find out which dependency and semantic features are the most informative and contribute most to the model. Performance of each feature is individually evaluated with cross validation. The model that combines all features shows promising results and successfully detects synonymy relation. The main contribution of the study is to integrate both semantic and dependency relation within distributional aspect. Second contribution is considered as being first major attempt for Turkish synonym identification based on corpus-driven approach.

Keywords

Synonym near-synonym pattern-based dependency relations