N-gram Based Approach to Recognize the Twitter Accounts of Turkish Daily Newspapers


Creative Commons License

MAYDA I., YEŞİLTEPE M.

2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Türkiye, 16 - 17 Eylül 2017 identifier identifier

Özet

Twitter is one of the most popular social media networks in the world. It is also mostly used by corporate companies, media as well as individual users. Media organizations use Twitter to announce about the news. Although the language of the given news is formal and preferred words to share information are different for each organization. In this study, we proposed an approach to recognize the Twitter accounts of Turkish daily newspapers. Our approach is based on character 3-grams and word 2-grams for digitizing the texts. In order to classify the information, we performed the experiments on several classifiers and found that Sequential Minimal Optimization (SMO) outperformed other algorithms. We carried out the experiments on the real-dataset of Twitter accounts of Turkish daily newspapers and classified them accurately more than 98%.

Twitter is one of the most popular social media networks in the world. It is also mostly used by corporate companies, media as well as individual users. Media organizations use Twitter to announce about the news. Although the language of the given news is formal and preferred words to share information are different for each organization. In this study, we proposed an approach to recognize the Twitter accounts of Turkish daily newspapers. Our approach is based on character 3-grams and word 2-grams for digitizing the texts. In order to classify the information, we performed the experiments on several classifiers and found that Sequential Minimal Optimization (SMO) outperformed other algorithms. We carried out the experiments on the real-dataset of Twitter accounts of Turkish daily newspapers and classified them accurately more than 98%.