Automation in Construction, vol.140, 2022 (SCI-Expanded)
© 2022 Elsevier B.V.Occupational accidents are common in the construction industry, therefore developing prediction models to detect high severe accidents would be useful. However, existing studies are limited and usually focus on selecting the most appropriate machine learning method rather than identifying the most effective preprocessing pipeline before the prediction. In this study, a scenario-basis automated preprocessing model that identifies the best scenario is developed to predict the severity of construction accidents. The results show that the scenario combination of not removing missing data, not applying data binning, considering outliers, applying Min-Max-Scaler and one-hot encoding, and data resampling with random oversampling yielded the highest prediction performance with 0.6092 of F1-score. Permutation importance of XGBoost analysis indicates that year, cause material, age, past accidents, experience, and salary are the most influential attributes. This study contributes to society/practice through a model preventing high-severe accidents and theory/technology with novel preprocessing model to perform more reliable predictions.