Explainable step-wise binary classification for the susceptibility assessment of geo-hydrological hazards

Ekmekcioğlu Ö., KOÇ K.

CATENA, vol.216, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 216
  • Publication Date: 2022
  • Doi Number: 10.1016/j.catena.2022.106379
  • Journal Name: CATENA
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, CAB Abstracts, Environment Index, Veterinary Science Database, DIALNET, Civil Engineering Abstracts
  • Keywords: Flooding, Landslides, Disaster management, Explainable artificial intelligence, Geo-hydrological hazards, SHAP, ARTIFICIAL-INTELLIGENCE APPROACH, LANDSLIDE SUSCEPTIBILITY, BIVARIATE STATISTICS, FLOOD, MACHINE, PREDICTION, OPTIMIZATION, ALGORITHM, MODEL, TREES
  • Yıldız Technical University Affiliated: Yes


This research proposes a novel step-wise binary prediction framework for the susceptibility assessment of geo-hydrological hazards specific to floods and landslides. The framework of the study comprises two major steps: prediction of geo-hydrological hazard-prone locations (Step-1: hazard/non-hazard), and classification of geo-hydrological hazards by identifying the locations of floods and landslides separately (Step-2: floods/land-slides). We used 1326 historically experienced hazard locations (i.e., 726 for floods and 690 for landslides) in the Kentucky River basin, United States, along with the 13 hazard conditioning factors. Extremely randomized trees (ERT) coupled with the particle swarm optimization (PSO) was adopted to provide an effective classification scheme. Based on the predictions of the ERT-PSO in the first step, correctly classified hazard instances were used in the second step of the prediction task to further deepen the machine learning application. The results revealed a strong agreement between the predicted and observed hazard locations with an AUROC of 0.8032 and 0.8845 for geo-hydrological hazard (Step-1) and flood/landslide classifications (Step-2), respectively. The proposed hybrid prediction framework introduced considerably accurate performance as 73.78% and 72.91% of the hazard and non-hazard classes were correctly identified at Step-1, respectively, while at Step-2, 72.31% of the flooding points and 84.85% of the landslide points were ascertained accurately. Overall findings emerged from Step-1 illustrated that nearly 10% of the entire basin is susceptible to geo-hydrological hazards with very high probability, whereas very low susceptible areas cover only 20% of the basin. A model-agnostic game-theory based SHapley Additive explanations (SHAP) algorithm was employed to anatomize the contribution of hazard conditioning factors on the incident outcome predictions aiding to increase the interpretability of the adopted methodology. The holistic approach adopted in the present research has significant potential in providing insights into the practical and theoretical grounds of the literature.