On the provenance extraction techniques from large scale log files

Tufek, Alper; AKTAŞ, Mehmet

doi:10.1002/cpe.6559

On the provenance extraction techniques from large scale log files

Tufek A., AKTAŞ M. S.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.35, sa.15, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 35 Sayı: 15
Basım Tarihi: 2023
Doi Numarası: 10.1002/cpe.6559
Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: machine learning-based provenance extraction, numerical weather prediction models, provenance, provenance analysis
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Numerical weather prediction (NWP) models are the most important instruments to predict future weather. Provenance information is of central importance for detecting unexpected events that may develop during the long course of model execution. Besides, the need to share scientific data and results between researchers also highlights the importance of data quality and reliability. The weather research and forecasting (WRF) Model is an open-source NWP model. In this study, we propose a methodology for tracking the WRF model and for generating, storing, and analyzing provenance. We implement the proposed methodology-with a machine learning-based parser, which utilizes classification algorithms to extract provenance information. The proposed approach enables easy management and understanding of numerical weather forecast workflows by providing provenance graphs. By analyzing these graphs, potential faulty situations that may occur during the execution of WRF can be traced to their root causes. Our proposed approach has been evaluated and has been shown to perform well even in a high-frequency provenance information flow.