24th International Conference on Computational Science and Its Applications, ICCSA 2024, Ha-Noi, Vietnam, 1 - 04 July 2024, vol.14819 LNCS, pp.149-164
This study explores the potential of Wav2Lip, a state-of-the-art lip-sync model, in multilingual environments. We assess its performance in generating lip-synchronized videos for Turkish, Persian, and Arabic languages. The evaluation results reveal promising language independence for Wav2Lip, achieving comparable accuracy to English. The research identifies the gap in research on lip-sync models for diverse languages and emphasizes the need for broader exploration. Additionally, we introduce a comprehensive Face-to-Face Translation workflow, outlining the fundamental elements for a seamless cross-lingual communication system. This work highlights the importance of Lip Sync models and the potential of Wav2Lip within such a system. By acknowledging current limitations and advocating for advancements in real-time models and high-resolution datasets, this study lays the groundwork for the development of revolutionary Face-to-Face Translation systems, fostering a future of barrier-free communication.