Am I typing fresh tweets: Detecting up-to-dateness and worth of categorical information in microblogs


Cingiz M. Ö., Diri B., Biricik G.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.42, ss.5256-5263, 2015 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 42
  • Basım Tarihi: 2015
  • Doi Numarası: 10.1016/j.eswa.2015.02.025
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.5256-5263
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Microblogs are one of the most popular social network areas where users share their opinions, daily activities, interests or other user content. As microblogs generally pose the user's interests, the field of interests can be extracted by using the presented content. In this study, we group microblog users as normal or bot depending on their supplied content and evaluate the user groups with respect to how well they reflect their categories with fresh entries, essentially by using content mining. Traditional content mining studies do not evaluate whether the supplied user entries are up-to-date or not. Unlike similar studies, we check up-to-dateness of users' content by simultaneously retrieving user entries and RSS news feeds. If a term of user content is absent in the feature set that is formed by RSS news feeds, it is not regarded as a feature to check the freshness of the content. For each user group, we divide users into predefined categories and inspect how well the group users post relevant entries while checking the up-to-dateness of their content. Our experimental results prove that hot users always post fresher and category-relevant entries. Finally, we visualize the categorization performances of each user group's entries with Cobweb. The Cobweb presentation unveils the miscategorization tendencies of the user groups. (C) 2015 Elsevier Ltd. All rights reserved.

Abstract

Microblogs are one of the most popular social network areas where users share their opinions, daily activities, interests or other user content. As microblogs generally pose the user’s interests, the field of interests can be extracted by using the presented content. In this study, we group microblog users as normal or bot depending on their supplied content and evaluate the user groups with respect to how well they reflect their categories with fresh entries, essentially by using content mining. Traditional content mining studies do not evaluate whether the supplied user entries are up-to-date or not. Unlike similar studies, we check up-to-dateness of users’ content by simultaneously retrieving user entries and RSS news feeds. If a term of user content is absent in the feature set that is formed by RSS news feeds, it is not regarded as a feature to check the freshness of the content. For each user group, we divide users into predefined categories and inspect how well the group users post relevant entries while checking the up-to-dateness of their content. Our experimental results prove that bot users always post fresher and category-relevant entries. Finally, we visualize the categorization performances of each user group’s entries with Cobweb. The Cobweb presentation unveils the miscategorization tendencies of the user groups.

Keywords

Microblog categorization, Short text classification, Social media, Twitter