Generating landslide archive inventories for Türkiye using web scraping and natural language processing techniques


Najatishendi E., Görüm T., FİDAN S., BALIK ŞANLI F.

Natural Hazards, cilt.122, sa.1, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 122 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s11069-025-07753-8
  • Dergi Adı: Natural Hazards
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, IBZ Online, Environment Index, Geobase, INSPEC
  • Anahtar Kelimeler: Geocoding, Landslide inventory, Landslides, Natural language processing, Web scraping
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Landslides are among the most frequent natural hazards that cause significant loss of life and serious economic damage worldwide. Although many inventories have been created using different approaches to understand landslide events, these inventories are rarely updated automatically or in real time. Traditional approaches are time-consuming and labor-intensive and are often limited in timeliness because of reporting delays. To address these challenges, we developed an automated approach that integrates web scraping, natural language processing (NLP), and geocoding techniques using digital media news sources in Türkiye to create a landslide archive inventory. Our algorithm verified 1727 of the 3051 news articles it captured between 1997 and 2024 as landslides and identified a total of 478 fatalities in 212 deadly incidents. A total of 66.5% of the landslides captured on the web were located at the neighborhood/village level, providing substantial spatial accuracy. This location accuracy also enabled risk estimation at the neighborhood/village level. A comparison with the manual national inventory revealed moderate agreement, with F1 scores ranging from 0.434 to 0.552 in the ± 1 and ± 7 daytime windows, respectively. The automated method not only captures spatial and temporal patterns of landslides but also extracts key attributes such as location, number of fatalities, and triggering factors (i.e., natural and anthropogenic). Our study demonstrates the potential of web-based automated approaches to complement traditional landslide inventories by providing near-real-time and verifiable data. Finally, we suggest adopting common reporting standards for natural hazard digital newspapers so that this approach can be applied globally.