BIOENERGY RESEARCH, cilt.18, sa.1, 2025 (SCI-Expanded, Scopus)
The growing demand for sustainable energy calls for efficient and accurate methods to optimize biofuel production processes. Hydrothermal liquefaction (HTL) is a promising thermochemical technique to convert wet biomass into biocrude oil, but estimating yield across diverse feedstocks and conditions remains challenging. In this study, we develop and benchmark a series of machine learning models to predict biocrude oil yield from HTL, using a comprehensive dataset of 650 biomass samples and process parameters, including elemental composition and higher heating value (HHV). Notably, this is the first study to incorporate HHV as a predictive feature at this scale. Seven ML models-including XGBoost, Random Forest, and Gaussian Process Regressor-were optimized via Bayesian hyperparameter tuning and evaluated through a dual-validation strategy combining tenfold cross-validation with a hold-out test set. XGBoost achieved the highest performance (R2 = 0.97, RMSE = 0.033). To ensure model interpretability, SHAP and SAGE techniques were applied, identifying HHV, carbon content, and pressure as key yield predictors. These results provide a transparent, data-driven framework for enhancing reactor design and feedstock selection in bio-oil production systems. The study underscores the potential of interpretable ML in advancing the predictive capabilities of renewable fuel technologies.