A Hybrid CGAN-FNN Framework for Robust Hand Gesture Recognition in Real-Time Systems


Toprak B., Koroglu O., Bayram O., GÜR E., İŞCAN M.

7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, ICHORA 2025, Ankara, Türkiye, 23 - 24 Mayıs 2025, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ichora65333.2025.11017280
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: CGAN, Data Augmentation, FNN, Hand Gesture Recognition, Hybrid Architecture, Machine Learning, Real-Time
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

This study proposes a novel approach that combines Conditional Generative Adversarial Network (CGAN) and feedforward Neural Network (FNN) architectures to generate high-variability, balanced sEMG datasets targeting enhancing NN classification performance while ensuring its applicability to real-time operating systems. The low-quality nature of sEMG datasets, influenced by factors such as muscle fatigue, noise, and physiological differences among patients, negatively impact the performance of both probabilistic models and NN models, making data generation a crucial task. To evaluate the impact of CGAN-generated data on classification performance, Naïve Bayes (NB) and Gaussian Mixture Model (GMM) are employed as comparative classification methods, providing insights into the performance changes of the proposed FNN model. The developed CGAN model is designed to generate 1,000 new fake data samples for each of the five fundamental hand gestures (extension, flexion, fist, rest, and spread) by conditioning the generator and discriminator models with the class labels from the utilized real dataset. A normal distribution filter was applied to the generated dataset to enhance its real-data likeness. As the result of hyper-optimization and sliding window application, the proposed FNN model demonstrated real-time compatible high classification results, including but not limited to 88% classification accuracy with the usage of only 6 layers in its single hidden layer, within an inference time of 6 milliseconds. This classification result is 14.3 % better than probabilistic methods tested under the same conditions.