On the distributed software architecture of a data analysis workflow: A case study

Tasgetiren, Nail; Tigrak, Umit; Bozan, Erdal; Gul, Guven; Demirci, Emir; Saribiyik, Hakan; AKTAŞ, Mehmet

doi:10.1002/cpe.6522

On the distributed software architecture of a data analysis workflow: A case study

Tasgetiren N., Tigrak U., Bozan E., Gul G., Demirci E., Saribiyik H., ...Daha Fazla

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.34, sa.9, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 34 Sayı: 9
Basım Tarihi: 2022
Doi Numarası: 10.1002/cpe.6522
Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: data analysis workflow, distributed software architecture, facade design pattern, lambda software architecture, machine learning workflows
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Hybrid distributed computing software architectures gain great importance in data analysis workflows as the number of available underlying machine learning libraries and data storage systems increase. We argue that there is a need for novel approaches for software architecture designs that can enable machine learning data analysis workflows to run on top of different subsystem libraries. To address this need, we propose a hybrid distributed software architecture in this manuscript. The proposed architecture manages machine learning models for both supervised and unsupervised machine learning data analysis workflows. To show the usability of the proposed architecture, we implement a prototype for the banking sector as a case study. The prototype application includes two data analysis workflows: a workflow for predicting the loan usage tendency of customers, and a workflow for clustering the customers based on the usage patterns of banking loans. The prototype is tested on a large scale banking dataset. Performance tests were carried out to investigate the performance in terms of both responsiveness and scalability of the system. The results obtained reveal the usability of the proposed architecture.