On the distributed software architecture of a data analysis workflow: A case study


Tasgetiren N., Tigrak U., Bozan E., Gul G., Demirci E., Saribiyik H., ...Daha Fazla

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.34, sa.9, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 34 Sayı: 9
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1002/cpe.6522
  • Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: data analysis workflow, distributed software architecture, facade design pattern, lambda software architecture, machine learning workflows
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Hybrid distributed computing software architectures gain great importance in data analysis workflows as the number of available underlying machine learning libraries and data storage systems increase. We argue that there is a need for novel approaches for software architecture designs that can enable machine learning data analysis workflows to run on top of different subsystem libraries. To address this need, we propose a hybrid distributed software architecture in this manuscript. The proposed architecture manages machine learning models for both supervised and unsupervised machine learning data analysis workflows. To show the usability of the proposed architecture, we implement a prototype for the banking sector as a case study. The prototype application includes two data analysis workflows: a workflow for predicting the loan usage tendency of customers, and a workflow for clustering the customers based on the usage patterns of banking loans. The prototype is tested on a large scale banking dataset. Performance tests were carried out to investigate the performance in terms of both responsiveness and scalability of the system. The results obtained reveal the usability of the proposed architecture.