MS-TR: A Morphologically enriched sentiment Treebank and recursive deep models for compositional semantics in Turkish

Zeybek S., Koc E., Secer A.

COGENT ENGINEERING, vol.8, no.1, 2021 (ESCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 8 Issue: 1
  • Publication Date: 2021
  • Doi Number: 10.1080/23311916.2021.1893621
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus
  • Keywords: Recursive neural networks, sentiment analysis, sentiment treebank, opinion mining, morphologically rich languages
  • Yıldız Technical University Affiliated: Yes


Recursive Deep Models have been used as powerful models to learn compositional representations of text for many natural language processing tasks. However, they require structured input (i.e. sentiment treebank) to encode sentences based on their tree-based structure to enable them to learn latent semantics of words using recursive composition functions. In this paper, we present our contributions and efforts for the Turkish Sentiment Treebank construction. We introduce MS-TR, a Morphologically Enriched Sentiment Treebank, which was implemented for training Recursive Deep Models to address compositional sentiment analysis for Turkish, which is one of the well-known Morphologically Rich Language (MRL). We propose a semi-supervised automatic annotation, as a distant-supervision approach, using morphological features of words to infer the polarity of the inner nodes of MS-TR as positive and negative. The proposed annotation model has four different annotation levels: morph-level, stem-level, token-level, and review-level. Each annotation level's contribution was tested using three different domain datasets, including product reviews, movie reviews, and the Turkish Natural Corpus essays. Comparative results were obtained with the Recursive Neural Tensor Networks (RNTN) model which is operated over MS-TR, and conventional machine learning methods. Experiments proved that RNTN outperformed the baseline methods and achieved much better accuracy results compared to the baseline methods, which cannot accurately capture the aggregated sentiment information.