Implementation of Association Rule Mining Algorithms on Distributed Data Processing Platforms


SESVER D., TUNA S., AKTAŞ M. S., KALIPSIZ O., KANLI A., TURGUT U.

International Conference on Computer Science and Engineering (UBMK) 2018, 11 - 15 Eylül 2019 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2019.8907040
  • Anahtar Kelimeler: Association Rule Mining Algorithms, Big Data Processing and Analysis Platfoms, Apriori Algorithm, Eclat Algorithm, Pascal Algorithm, Spark Platform
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Association rule mining algorithms are a frequently used data mining tecnique. It is aimed to find the items that are frequently found from the data. Nowadays, large data processing and analysis platforms are not focused on data mining, so they do not offer large-scale libraries for association rule mining algorithms. In the scope of this research, a library has been developed for association rule mining algorithms on a large data processing platform. The Apache Spark platform has been preferred in terms of common usage for the research case study. Implementation methods of different algorithms have been implemented on this platform to benefit from the Map Reduce programming model. In this context, Apriori, Eclat and Pascal algorithms are implemented for large data platform. The library created by the implementation method we suggest is comparatively analyzed in terms of performance metrics on big data processing platforms with single and multiple nodes. The methods implemented within the scope of the research are also compared with the performance of the FpGrowth algorithm implemented by the Spark platform. The results of our research show that when tested on large scale data, the Apriori algorithm gives much better performance values than the other algorithms when switching from single-node cluster environment to multi-node cluster environment.