On the provenance extraction techniques from large scale log files


Tufek A., AKTAŞ M. S.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021 (Peer-Reviewed Journal) identifier identifier

  • Publication Type: Article / Article
  • Publication Date: 2021
  • Doi Number: 10.1002/cpe.6559
  • Journal Name: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
  • Journal Indexes: Science Citation Index Expanded, Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Keywords: machine learning-based provenance extraction, numerical weather prediction models, provenance, provenance analysis

Abstract

Numerical weather prediction (NWP) models are the most important instruments to predict future weather. Provenance information is of central importance for detecting unexpected events that may develop during the long course of model execution. Besides, the need to share scientific data and results between researchers also highlights the importance of data quality and reliability. The weather research and forecasting (WRF) Model is an open-source NWP model. In this study, we propose a methodology for tracking the WRF model and for generating, storing, and analyzing provenance. We implement the proposed methodology-with a machine learning-based parser, which utilizes classification algorithms to extract provenance information. The proposed approach enables easy management and understanding of numerical weather forecast workflows by providing provenance graphs. By analyzing these graphs, potential faulty situations that may occur during the execution of WRF can be traced to their root causes. Our proposed approach has been evaluated and has been shown to perform well even in a high-frequency provenance information flow.