Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G rselle Sohbet Etmek


Zeer A., Dogan E., Erdem Y., Ince E., Shbib O., UZUN M., ...Daha Fazla

8th International Artificial Intelligence and Data Processing Symposium, IDAP 2024, Malatya, Türkiye, 21 - 22 Eylül 2024 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/idap64064.2024.10710874
  • Basıldığı Şehir: Malatya
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: artificial intelligence, generative models, human evaluation, large language models, LLM-as-a-judge, multimodality, natural language processing, visual question answering
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance.