Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods

KOÇ K., Ekmekcioğlu Ö., GÜRGÜN A. P.

Engineering, Construction and Architectural Management, vol.30, no.9, pp.4486-4517, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 30 Issue: 9
  • Publication Date: 2023
  • Doi Number: 10.1108/ecam-04-2022-0305
  • Journal Name: Engineering, Construction and Architectural Management
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, ABI/INFORM, Aerospace Database, Business Source Elite, Business Source Premier, Communication Abstracts, ICONDA Bibliographic, Index Islamicus, INSPEC, Metadex, Civil Engineering Abstracts
  • Page Numbers: pp.4486-4517
  • Keywords: Artificial intelligence, Construction safety, Machine learning, Occupational health and safety (OHS), Occupational accidents, Safety management, OCCUPATIONAL-HEALTH, SAFETY MANAGEMENT, PERFORMANCE, CLASSIFICATION, VARIABLES, PATTERNS, INDUSTRY, SYSTEM, KOREA
  • Yıldız Technical University Affiliated: Yes


© 2022, Emerald Publishing Limited.Purpose: Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management applications over the last decades, construction industry still accounts for a considerable percentage of all workplace fatalities across the world. This study aims to predict occupational accident outcomes based on national data using machine learning (ML) methods coupled with several resampling strategies. Design/methodology/approach: Occupational accident dataset recorded in Turkey was collected. To deal with the class imbalance issue between the number of nonfatal and fatal accidents, the dataset was pre-processed with random under-sampling (RUS), random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). In addition, random forest (RF), Naïve Bayes (NB), K-Nearest neighbor (KNN) and artificial neural networks (ANNs) were employed as ML methods to predict accident outcomes. Findings: The results highlighted that the RF outperformed other methods when the dataset was preprocessed with RUS. The permutation importance results obtained through the RF exhibited that the number of past accidents in the company, worker's age, material used, number of workers in the company, accident year, and time of the accident were the most significant attributes. Practical implications: The proposed framework can be used in construction sites on a monthly-basis to detect workers who have a high probability to experience fatal accidents, which can be a valuable decision-making input for safety professionals to reduce the number of fatal accidents. Social implications: Practitioners and occupational health and safety (OHS) departments of construction firms can focus on the most important attributes identified by analysis results to enhance the workers' quality of life and well-being. Originality/value: The literature on accident outcome predictions is limited in terms of dealing with imbalanced dataset through integrated resampling techniques and ML methods in the construction safety domain. A novel utilization plan was proposed and enhanced by the analysis results.