SIGNAL IMAGE AND VIDEO PROCESSING, vol.17, no.2, pp.527-534, 2023 (SCI-Expanded)
While Facial Expression Recognition (FER) has become a broadly applied technology in tracking individual people's emotions, its application in estimating emotion in a human-to-human dyadic interaction is still relatively sparse. This paper describes a study where FER is applied to a dyadic video-mediated interaction to collect facial interaction data that will be used to predict the emotions of one of the interlocutors. To realize this, we utilized the histogram of oriented gradients algorithm to detect human faces from videos by analyzing every frame. Then, we used a Deep Neural Networks (DNN) model to detect the facial expressions of two people who are having a conversation on videos. We measured the facial patterns as indicators of emotions of both interlocutors during the whole interaction. Afterward, we trained a Long Short-Term Memory (LSTM) model to estimate one person's emotions from the video. We performed the analysis on videos of a specific psychiatrist and his patients; then, we performed the patients' emotion estimation. This work shows how our multi-stage DNN (Mini-Xception) and LSTM models can predict the reaction emotions using the patient's facial expressions during the interaction. We believe that the proposed method can be applied to the future generation of facial expressions of virtual characters or social robots when they interact with humans.