Exploring the additional value of class imbalance distributions on interpretable flash flood susceptibility prediction in the Black Warrior River basin, Alabama, United States

Ekmekcioğlu Ö., KOÇ K., Özger M., IŞIK Z.

Journal of Hydrology, vol.610, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 610
  • Publication Date: 2022
  • Doi Number: 10.1016/j.jhydrol.2022.127877
  • Journal Name: Journal of Hydrology
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Artic & Antarctic Regions, BIOSIS, CAB Abstracts, Communication Abstracts, Environment Index, INSPEC, Metadex, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Keywords: Associate Editor, Artificial intelligence, Flash flood susceptibility, Geographic information system (GIS), Imbalance data, Machine learning, SHapley Additive exPlanations (SHAP), Flood risk management, ARTIFICIAL-INTELLIGENCE, RANDOM FOREST, TREES
  • Yıldız Technical University Affiliated: Yes


© 2022 Elsevier B.V.This study proposes a novel flash flood susceptibility prediction framework with a particular emphasis on the extent of imbalance between the number of flooding and non-flooding events as majority of the events result in non-flooding. The class imbalance issue and the magnitude of the imbalance was explored in this study to highlight the uncertain nature of the flooding phenomenon. Therefore, the Random Forest (RF) was initially adopted to evaluate five imbalance class distribution scenarios (i.e., 1x, 10x, 25x, 50x, 100x non-flood events, for each x flood event). Parameter configurations of developed models were determined with the state-of-the-art metaheuristic, the Cuckoo Search (CS) algorithm. The CS-RF model showed the highest (0.8455) prediction capability with regards to the area under the receiver operating characteristic (AUROC) once the extent of imbalance was set as 50x. The CS-RF model was then benchmarked with another bagging, i.e., Extra Trees, and two boosting, i.e., Adaptive Boosting (Adaboost) and eXtreme Gradient Boosting (XGBoost) algorithms, all integrated with the CS technique. Analysis results showed that the CS-RF is the most promising tree-based machine learning technique in flash flood susceptibility projection for the selected study area. According to the predictions, a flash flood susceptibility map was generated, where 9.35% of the basin was under very high flash flood risk. A recently developed model-agnostic game-theoretical method, SHapley Additive exPlanations (SHAP), was used for anatomizing the flash flood conditioning factors to highlight the contribution of each feature on the incident outcome prediction ensuring the transparency of the model findings. Overall, this study contributes to both theory and practice with particular focus on the model interpretability and existence of imbalance in the occurrence of flash flood events, assisting decision-makers in enhancing strategies to combat hazardous impacts of floods.