Turkish Scene Text Recognition with a Lightweight and Robust Transformer Hafif ve G rb z D n st r c ile T rk e Sahne Metni Tanima
33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Türkiye, 25 - 28 Haziran 2025, (Tam Metin Bildiri)
- Yayın Türü: Bildiri / Tam Metin Bildiri
- Doi Numarası: 10.1109/siu66497.2025.11111830
- Basıldığı Şehir: İstanbul
- Basıldığı Ülke: Türkiye
- Anahtar Kelimeler: optical character recognition, scene text recognition, vision transformer
- Yıldız Teknik Üniversitesi Adresli: Evet
Özet
In this study, we propose two lightweight vision transformers, ViT-TR-Tiny and ViT-TR-Nano, for scene text recognition. These models achieve the optimal balance of recognition accuracy and computational efficiency by significantly reducing overall network complexity. Experimental results show that the proposed models achieve competitive word accuracy with only minor accuracy degradation when compared to well-known approaches in the literature. Remarkably, the TensorRT-optimized ViT-TR-Tiny achieved 93.44% word accuracy on STRIT and 92.78% on TS-TR while processing 2264 images per second. These findings highlight the promise of efficient transformer-based architectures for tackling complex scene text recognition tasks, particularly in Turkish.