On the Provenance Extraction Techniques from Large Scale Log Files: A Case Study for the Numerical Weather Prediction Models

Workshops held at the 26th International Conference on Parallel and Distributed Computing, Euro-Par 2020, Virtual, Online, 24 - 25 Ağustos 2020, cilt.12480 LNCS, ss.249-260

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 12480 LNCS
Doi Numarası: 10.1007/978-3-030-71593-9_20
Basıldığı Şehir: Virtual, Online
Sayfa Sayıları: ss.249-260
Anahtar Kelimeler: Machine learning-based provenance extraction, Numerical weather prediction models, Provenance, Provenance analysis, Weather forecast models
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Day by day, severe meteorological events increasingly highlight the importance of fast and accurate weather forecasting. There are various Numerical Weather Prediction (NWP) models worldwide that are run on either a local or a global scale to predict future weather. NWP models typically take hours to finish a complete run, however, depending on the input parameters and the size of the forecast domain. Provenance information is of central importance for detecting unexpected events that may develop during model execution, and also for taking necessary action as early as possible. Besides, the need to share scientific data and results between researchers or scientists also highlights the importance of data quality and reliability. In this study, we develop a framework for tracking The Weather Research and Forecasting (WRF) model and for generating, storing, and analyzing provenance data. We develop a machine-learning-based log parser to enable the proposed system to be dynamic and adaptive so that it can adapt to different data and rules. The proposed system enables easy management and understanding of numerical weather forecast workflows by providing provenance graphs. By analyzing these graphs, potential faulty situations that may occur during the execution of WRF can be traced to their root causes. Our proposed system has been evaluated and has been shown to perform well even in a high-frequency provenance information flow.