Socio-economic Evaluation of Settlements with Machine Learning Approaches: The Case Study of Istanbul

Şimşir M., Geçici E.

3rd International Symposium of Scientific Research and Innovative Studies, Balıkesir, Turkey, 15 March 2023

  • Publication Type: Conference Paper / Summary Text
  • City: Balıkesir
  • Country: Turkey
  • Yıldız Technical University Affiliated: Yes


Socioeconomic status is an essential concept to understand what a citizen's position in society is and based on that society can be divided into categories. To determine socioeconomic status, there are different factors: Education level, wealth, income level, occupation, and access to good nutrition some of these essential factors. The primary purpose of this study is based on the socioeconomic clustering of the districts in the province of Istanbul using machine learning methods. In this regard, the aim is to investigate whether the districts have socioeconomic similarities using existing data. For this purpose, population, average household size, number of hospitals, water consumption, domestic waste, number of public bread buffets, literacy number unknown, literate, illiterate, preschool, primary school, secondary school, housing sales amount, number of rail stations, number of vehicles data on districts, which are publicly available and shared by İstanbul Metropolitan Municipality website, is used for analysis. In order to analyze the variables and examine the districts from a socioeconomic point of view, the k-means method, which is an unsupervised learning technique, is used. In this learning type, there is no 𝑦 variable, namely the response variable, in the data set. The methods in this learning are often used to explain and inferences about data. In this context, one of the studies carried out under the title of unsupervised learning is clustering. The clustering is used for the aggregation of observation values with a similar characteristic structure. The k-means method is one of these methods. It is based on the division of the existing data set into a k set using the k parameter in the name of the method. According to the results, it is observed that the population variable is dominant in the existing data sets, and the districts are clustered according to this variable. When the population variable is removed, it is observed that similar clusters are obtained. One reason for this is that the population variable in the data set is associated with other variables. As a result, the socio-economic distinction of the districts in the existing studies could not be obtained with the current data set by using less number of clusters. As the number of clusters increased, on the other hand, it is observed that the districts in the clusters are similar to each other in related to the socio-economic structure.