Scenario-based automated data preprocessing to predict severity of construction accidents


Automation in Construction, vol.140, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 140
  • Publication Date: 2022
  • Doi Number: 10.1016/j.autcon.2022.104351
  • Journal Name: Automation in Construction
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Communication Abstracts, ICONDA Bibliographic, INSPEC, Metadex, Civil Engineering Abstracts
  • Keywords: Automated pre-processing, Accident risk assessment, Occupational health and safety (OHS), Accident severity, Machine learning, Artificial intelligence, eXtreme gradient boosting (XGBoost), OCCUPATIONAL ACCIDENTS, CLASSIFICATION, GENERATION, INCIDENTS, EQUIPMENT, INDUSTRY, WORKERS
  • Yıldız Technical University Affiliated: Yes


© 2022 Elsevier B.V.Occupational accidents are common in the construction industry, therefore developing prediction models to detect high severe accidents would be useful. However, existing studies are limited and usually focus on selecting the most appropriate machine learning method rather than identifying the most effective preprocessing pipeline before the prediction. In this study, a scenario-basis automated preprocessing model that identifies the best scenario is developed to predict the severity of construction accidents. The results show that the scenario combination of not removing missing data, not applying data binning, considering outliers, applying Min-Max-Scaler and one-hot encoding, and data resampling with random oversampling yielded the highest prediction performance with 0.6092 of F1-score. Permutation importance of XGBoost analysis indicates that year, cause material, age, past accidents, experience, and salary are the most influential attributes. This study contributes to society/practice through a model preventing high-severe accidents and theory/technology with novel preprocessing model to perform more reliable predictions.