Unsupervised Learning Methods and Application in Adult Census  Income Dataset

Uğurlu, Emel; Şaylı, Ayla

doi:10.5281/zenodo.14182242

Unsupervised Learning Methods and Application in Adult Census Income Dataset

Uğurlu E., Şaylı A.

Avrupa Bilim ve Teknoloji Dergisi, sa.54, ss.127-143, 2024 (Hakemli Dergi)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2024
Doi Numarası: 10.5281/zenodo.14182242
Dergi Adı: Avrupa Bilim ve Teknoloji Dergisi
Sayfa Sayıları: ss.127-143
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Unsupervised Learning is a data analysis technique used to explorer latent structure in data. Unsupervised Learning is the process of concentrating together objects with similar properties and grouping different ones without supervising the elements of the data. In this study, K-Means, DBSCAN and BIRCH clustering algorithms, which are unsupervised learning methods, were applied to the Adult Census Income dataset with 14 attribute and the target attribute is the annual income target attribute on Jupyter Notebook using Python3. In general, this dataset was used for classification purposes according to the target attribute based on its values which are less than 50 thousand dollars (0, zero class) or not (1, one class). However the annual income may not be able to give the similar groups of people. The aim of this study is to find these groups not only based on the annual income, considering all the attibutes in the dataset, comparing the performances of the clustering algorithms to observe the effects of the optimal number of clusters on the results. We first preprocessed this dataset and named as the Preprocessed Dataset and then we solved the balancing problem in this preprocessed set by the SMOTE method and named as the SMOTED_Preprocessed Dataset. After applying the algorithms to the datasets, 2 and 3 clusters are found and the results of the clusters are evaluated and the features determining the clusters were interpreted. Keywords: Machine Learning, Unsupervised Learning Methods, K-Means, DBSCAN, BIRCH.