Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G rselle Sohbet Etmek


Zeer A., Dogan E., Erdem Y., Ince E., Shbib O., UZUN M., ...More

8th International Artificial Intelligence and Data Processing Symposium, IDAP 2024, Malatya, Turkey, 21 - 22 September 2024, (Full Text) identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/idap64064.2024.10710874
  • City: Malatya
  • Country: Turkey
  • Keywords: artificial intelligence, generative models, human evaluation, large language models, LLM-as-a-judge, multimodality, natural language processing, visual question answering
  • Yıldız Technical University Affiliated: Yes

Abstract

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance.