Water Level Predictions at Both Entrances of a Sea Strait by Using Machine Learning

Altaş, Furkan; ÖZTÜRK, Mehmet

doi:10.3390/w16162335

Water Level Predictions at Both Entrances of a Sea Strait by Using Machine Learning

Altaş F., ÖZTÜRK M.

Water (Switzerland), cilt.16, sa.16, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 16
Basım Tarihi: 2024
Doi Numarası: 10.3390/w16162335
Dergi Adı: Water (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), CAB Abstracts, Compendex, Environment Index, Food Science & Technology Abstracts, Geobase, INSPEC, Pollution Abstracts, Veterinary Science Database, Directory of Open Access Journals
Anahtar Kelimeler: Bosphorus, feature selection, machine learning, regression, test, training, water level prediction
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, we employed a novel machine learning (ML) methodology to predict water levels (WLs) from their constituent components at both entrances of a sea strait, namely the Bosphorus. The principal components of WLs in the strait are mean sea level pressure (MSLP), wind speeds (W, U, V), discharges from the Danube River (Q), and tidal conditions (T). Following the application of the t-test, SFS, PCA, and VIF analyses, and the consideration of a range of ML techniques (including Linear Regression (LR), Regression Trees (RT), Support Vector Machine Regression (SVMR), Gaussian Process Regression (GPR), and Artificial Neural Networks (ANNs)), the number of predictors was reduced in order to obtain the most flexible and accurate regression model. As a consequence of this process, MSLP, W, and Q were retained, while the remaining variables (tide) were excluded. Furthermore, the order of importance for the optimal regression model was identified as Q_lagged, MSLP, V_lagged, and U at the north entrance model, while at the south entrance model, the order was MSLP, Q_lagged, U, and V. The models were trained using 80%, 50%, and 33% of the data, respectively. The model trained on 80% of the data yielded the most accurate predictions, with a correlation coefficient of R ≅ 0.95 and a root mean square error (RMSE) of 0.02 m. The model demonstrated a markedly superior predictive capacity compared to previous studies in the region, which is attributed to two factors that are regarded as the novelty of the study. The first factor was the random selection of training data from each month of the year, which allowed for the representation of the general pattern of water level (WL) behaviours. The second factor was the selection of the physically most meaningful inputs, which were selected according to the results of the significance and multicollinearity check. Furthermore, the predicted and measured WLs were employed as boundary conditions in a hydrodynamic model to evaluate the predictive capacity of the predicted WLs on the current results in the strait in comparison to the use of observed WLs. The 80% data-trained model exhibited similar current velocities to the observed WL model used, whereas the 50% and 30% data-trained models yielded slightly different results.