Development and performance comparison of optimized machine learning-based regression models for predicting energy-related carbon dioxide emissions

Koca Akkaya E., AKKAYA A. V.

Environmental science and pollution research international, vol.30, no.58, pp.122381-122392, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 30 Issue: 58
  • Publication Date: 2023
  • Doi Number: 10.1007/s11356-023-30955-1
  • Journal Name: Environmental science and pollution research international
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, IBZ Online, ABI/INFORM, Aerospace Database, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), BIOSIS, CAB Abstracts, EMBASE, Environment Index, Geobase, MEDLINE, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Page Numbers: pp.122381-122392
  • Keywords: CO2 emissions, Machine learning, Modeling, Prediction, Regression
  • Yıldız Technical University Affiliated: Yes


Accurate prediction of CO2 emissions for the countries has become a crucial task in decision-making processes for planning energy conversion and usage, supporting the design of effective emissions reduction strategies, and helping to achieve the goal of a sustainable and low-carbon future. Therefore, this study aims to develop a general model that can predict the national CO2 emissions of each country using data from 68 countries with high prediction accuracy based on machine learning regression models. Nine prediction models were developed using Support Vector Regression, Ensemble of Trees, and Gaussian Process Regression algorithms as machine learning methods, and their prediction performances were compared. Additionally, the hyperparameters of these three machine-learning methods were tuned by Bayesian optimization to improve their prediction performance. The test results of the optimized Gaussian Process Regression model (MSE = 106.68, RMSE = 10.328, MAE = 4.904, MAPE = 3.38%, R2 = 0.9998) showed that it was the best prediction model among the all developed models. Additionally, the optimized Gaussian Process Regression model gave very robust results in predicting CO2 emissions in many countries, indicating that it can be used reliably and with high accuracy as a promising prediction model.