A Hybrid CNN and RNN Variant Model for Music Classification


Creative Commons License

Ashraf M., Abid F., Din I. U., Rasheed J., Yesiltepe M., Yeo S. F., ...Daha Fazla

Applied Sciences (Switzerland), cilt.13, sa.3, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13 Sayı: 3
  • Basım Tarihi: 2023
  • Doi Numarası: 10.3390/app13031476
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: music classification, music information retrieval, convolutional neural network, recurrent neural network, Mel-spectrogram
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

© 2023 by the authors.Music genre classification has a significant role in information retrieval for the organization of growing collections of music. It is challenging to classify music with reliable accuracy. Many methods have utilized handcrafted features to identify unique patterns but are still unable to determine the original music characteristics. Comparatively, music classification using deep learning models has been shown to be dynamic and effective. Among the many neural networks, the combination of a convolutional neural network (CNN) and variants of a recurrent neural network (RNN) has not been significantly considered. Additionally, addressing the flaws in the particular neural network classification model, this paper proposes a hybrid architecture of CNN and variants of RNN such as long short-term memory (LSTM), Bi-LSTM, gated recurrent unit (GRU), and Bi-GRU. We also compared the performance based on Mel-spectrogram and Mel-frequency cepstral coefficient (MFCC) features. Empirically, the proposed hybrid architecture of CNN and Bi-GRU using Mel-spectrogram achieved the best accuracy at 89.30%, whereas the hybridization of CNN and LSTM using MFCC achieved the best accuracy at 76.40%.