Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G rselle Sohbet Etmek

Zeer A., Dogan E., Erdem Y., Ince E., Shbib O., UZUN M., ...Daha Fazla

8th International Artificial Intelligence and Data Processing Symposium, IDAP 2024, Malatya, Türkiye, 21 - 22 Eylül 2024, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/idap64064.2024.10710874
Basıldığı Şehir: Malatya
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: artificial intelligence, generative models, human evaluation, large language models, LLM-as-a-judge, multimodality, natural language processing, visual question answering
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance.