Sensors, cilt.25, sa.4, 2025 (SCI-Expanded)
Predicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tackle the issue of imbalanced datasets in traffic speed prediction. Traffic speed data are often biased toward high numbers because low traffic speeds are infrequent. The temporal aspect of traffic carries two important factors for low-speed value. The daily population movement, captured by the time of day, and the weather data, recorded by month, are both considered in this study. Hour-wise Pattern Organization and Month-wise Pattern Organization techniques were devised, which organize the speed data using these two factors as a metric with a view to providing a superior representation of data characteristics that are in the minority. In addition to these two methods, a Speed-wise Pattern Organization strategy is proposed, which arranges train and test samples by setting boundaries on speed while taking the volatile nature of traffic into consideration. We evaluated these strategies using four popular model types: long short-term memory (LSTM), gated recurrent unit networks (GRUs), bi-directional LSTM, and convolutional neural networks (CNNs). GRU had the best performance, achieving a MAPE (Mean Absolute Percentage Error) of 13.51%, whereas LSTM demonstrated the lowest performance, with a MAPE of 13.74%. We validated their robustness through our studies and observed improvements in model accuracy across all categories. While the average improvement was approximately 4%, our methodologies demonstrated superior performance in low-traffic speed scenarios, augmenting model prediction accuracy by 11.2%. The presented methodologies in this study are applied in the pre-processing steps, allowing their application with various models and additional pre-processing procedures to attain comparable performance improvements.