Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing †

Rafiei Oskooei, Amirkia; Caglar, Eren; Şahin, İbrahim; Kayabay, Ayse; AKTAŞ, Mehmet

doi:10.3390/app152312691

Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing †

Rafiei Oskooei A., Caglar E., Şahin I., Kayabay A., AKTAŞ M. S.

Applied Sciences (Switzerland), cilt.15, sa.23, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 23
Basım Tarihi: 2025
Doi Numarası: 10.3390/app152312691
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: applied computer vision, deep learning, generative AI, human–AI interaction, multimedia
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic ((Formula presented.)) computational complexity that renders multi-user video conferencing applications unscalable. This paper proposes and evaluates a practical system-level framework designed to mitigate these critical bottlenecks. The proposed architecture incorporates a turn-taking mechanism to reduce computational complexity from quadratic to linear in multi-user scenarios, and a segmented processing protocol to manage inference latency for a perceptually real-time experience. We implement a proof-of-concept pipeline and conduct a rigorous performance analysis across a multi-tiered hardware setup, including commodity (NVIDIA RTX 4060), cloud (NVIDIA T4), and enterprise (NVIDIA A100) GPUs. Our objective evaluation demonstrates that the system achieves real-time throughput ((Formula presented.)) on modern hardware. A subjective user study further validates the approach, showing that a predictable, initial processing delay is highly acceptable to users in exchange for a smooth, uninterrupted playback experience. The work presents a validated, end-to-end system design that offers a practical roadmap for deploying scalable, real-time generative AI applications in multilingual communication platforms.