An experimental and comparative benchmark study examining resource utilization in managed Hadoop context


Ozdil U. E., Ayvaz S.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, cilt.26, sa.3, ss.1891-1915, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 26 Sayı: 3
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1007/s10586-022-03728-7
  • Dergi Adı: CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Sayfa Sayıları: ss.1891-1915
  • Anahtar Kelimeler: Big Data, Managed Hadoop, Hadoop-on-PaaS, HiBench, Performance evaluation
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Transitioning cloud-based Hadoop frameworks from IaaS to PaaS, which are commercially conceptualized as pay-as-you-go or pay-per-use, often reduces the associated system costs. However, the managed Hadoop systems obscure the inner performance dynamics of the platform and present a black-box behavior to the end-users. The aim of this study was to investigate the resource utilization of current managed Hadoop platforms. Thus, we explored three prominent Hadoop-on-PaaS proposals as they come out-of-the-box and conducted Hadoop-specific workloads using the HiBench Benchmark Suite. During the benchmark executions, the system resource utilization data from the worker nodes were collected and analyzed. The results indicated that the same property specifications among cloud services neither do guarantee similar performance outputs, nor produce consistent results based on different workloads within themselves. We anticipate that the managed systems' architectures and pre-configurations play a crucial role in the performance outcomes.