Summarising Big Data: Common GitHub Dataset for Software Engineering Challenges

Şeker, Abdülkadir; Diri, Banu; Arslan, Halil; Amasyalı, Mehmet

doi:10.17776/csj.728932

Summarising Big Data: Common GitHub Dataset for Software Engineering Challenges

Şeker A., Diri B., Arslan H., Amasyalı M. F.

Cumhuriyet Science Journal, cilt.41, sa.3, ss.720-724, 2020 (TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 41 Sayı: 3
Basım Tarihi: 2020
Doi Numarası: 10.17776/csj.728932
Dergi Adı: Cumhuriyet Science Journal
Derginin Tarandığı İndeksler: Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.720-724
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In open-source software development environments; textual, numerical and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain criteria. In this context, using a different data set in each study makes a comparison of the accuracy of the studies quite difficult. In order to solve this problem, a common dataset was created and shared with the researchers, which would allow us to work on many software engineering problems.