Synthesizing facial expressions in dyadic human–robot interaction

Sham, Abdallah; Tikka, Pia; Lamas, David; Anbarjafari, Gholamreza

doi:10.1007/s11760-024-03202-4

Synthesizing facial expressions in dyadic human–robot interaction

Atıf İçin Kopyala

Sham A. H., Tikka P., Lamas D., Anbarjafari G.

Signal, Image and Video Processing, cilt.18, sa.Suppl 1, ss.909-918, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 18 Sayı: Suppl 1
Basım Tarihi: 2024
Doi Numarası: 10.1007/s11760-024-03202-4
Dergi Adı: Signal, Image and Video Processing
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
Sayfa Sayıları: ss.909-918
Anahtar Kelimeler: Emotion reaction, Emotion recognition, Facial reaction emotion synthesis, Responsible AI
Yıldız Teknik Üniversitesi Adresli: Hayır

Özet

Generative artificial intelligence (GenAI) can be used to create facial expressions of artificial human characters in real time based on the training dataset. However, the bottleneck that prevents natural dyadic interaction between an artificial character and a human lies in the GenAI’s limited capability to recognize dynamically changing contexts. To tackle this issue, we investigated how deep learning (DL) techniques could synthesize facial reaction emotions based on a sequence of the previous emotions. We applied action units from the facial action coding system to manipulate facial points of an artificial character inside unreal engine 4 using the OpenFace API. First the artificial character’s facial behavior was programmed to mimic human facial expressions on screen. For adequate reaction emotions, we then trained an autoencoder with a long short-term memory model to have a DL model. To validate the performance of our trained model, we compared our results on reaction expressions with our test dataset by using average root-mean-square error. Furthermore, sixteen test participants reported the apparent naturalness of the character’s reactions to the dynamic human expressions. Our findings are promising steps in developing facial reaction emotion synthesis into a dynamic system that can adapt to the user’s specific needs and context.