Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data

Kılıç, Batuhan; Bayrak, Onur; Gülgen, Fatih; Gürtürk, Mert; Abay, Perihan

doi:10.1007/s10109-023-00435-8

Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data

Kılıç B., Bayrak O. C., Gülgen F., Gürtürk M., Abay P.

Journal of Geographical Systems, cilt.26, sa.4, ss.601-622, 2024 (SSCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 26 Sayı: 4
Basım Tarihi: 2024
Doi Numarası: 10.1007/s10109-023-00435-8
Dergi Adı: Journal of Geographical Systems
Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, ABI/INFORM, Agricultural & Environmental Science Database, Business Source Elite, Business Source Premier, EconLit, Environment Index, Geobase, INSPEC, Civil Engineering Abstracts
Sayfa Sayıları: ss.601-622
Anahtar Kelimeler: Address matching, COVID-19, Geocoding, Machine learning, Random forest
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.