Application of random forest-based decision tree approach for modeling fully developed turbulent flow in rough pipes

Yetilmezsoy K.

Fluid Mechanics Research International Journal, vol.5, pp.4-16, 2023 (Peer-Reviewed Journal)


A random forest (RF)-based decision tree programming methodology was proposed for modeling fully developed turbulent flow conditions in sizing problems (Type 3) of pipe distribution systems. In the present computational study, a flexible RF-based soft-computing strategy was applied for the estimation of the required pipe diameter (D) and Darcy–Weisbach friction factor (λ or f) using five basic pipeline design variables such as absolute roughness of the pipe wall (ε = 0.01–10 mm), water temperature (T = 5–30 °C), pipe length (L = 30–2000 m), flow rate (Q = 0.001–3 m3/s), and head loss (Δh = 1–90 m). The prediction performance of the implemented RF-based model was assessed more than 15 different statistical goodness-of-fit parameters (e.g., determination coefficient (R2), mean absolute error (MAE), root mean squared error (RMSE), standard error of the estimate (SEE), index of agreement (IA) or Willmott’s Index (WI), the factor of two (FA2), fractional variance (FV), proportion of systematic error (PSE), coefficient of variation of RMSE (CV(RMSE) or scattering index (SI), Nash–Sutcliffe efficiency (NSE), Legates and McCabe’s index (LMI), Akaike information criterion (AIC), U95 and so forth) and useful mathematical diagrams such as box-and-whisker-plots and spread plots. The statistical metrics corroborated the superiority of the RF-based approach in predicting both the required pipe diameter (R2 = 0.9793, MAE = 0.0287 m, RMSE = 0.03833 m, SEE = 0.0326 m, IA or WI = 0.9933, CV(RMSE) or SI = 0.0595, NSE = 0.9753, LMI = 0.8482, and AIC = -1954.6438 for the testing dataset) and friction factor (R2 = 0.9576, MAE = 0.0011, RMSE = 0.0023, SEE = 0.0018, IA or WI = 0.9851, CV(RMSE) or SI = 0.0660, NSE = 0.9478, LMI = 0.8500, and AIC = -3646.7124 for the testing dataset). The proposed RF-based model was also tested against some dataset obtained from the relevant literature. The validation results indicated that the applied decision tree-based method produced realistic estimations and acceptable statistics (i.e., R2 = 0.9624, MAE = 0.0598 m, and RMSE = 0.0708 m for D values, and R2 = 0.9130, MAE = 0.0043, RMSE = 0.0052 for λ values) even at extreme L values. This study demonstrated the importance and ability of the applied soft-computing strategy to accurately predict D and λ values and eliminated error-prone steps of the traditional iterative approach.