Turkish-e5: E5 Model Enhanced for Turkish with Multi-Positive Contrastive Learning Turkish-e5: oklu Pozitif rneklemeli Kar sitsal ?grenme ile T rk esi G lendirilmi s E5 Modeli


Izdas T., Sancak O., KESGİN H. T., YÜCE M. K., AMASYALI M. F.

33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Turkey, 25 - 28 June 2025, (Full Text) identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu66497.2025.11112375
  • City: İstanbul
  • Country: Turkey
  • Keywords: contrastive learning, embedding model, MTEB, multi-positive sampling, retrieval
  • Yıldız Technical University Affiliated: Yes

Abstract

In this study, the multilingual embedding model intfloat/multilingual-e5-large-instruct was fine-tuned for Turkish retrieval tasks using the multi-positive sampling approach. In traditional fine-tuning processes, each query is typically associated with only a single correct answer (positive sample). However, in this study, a dataset was constructed where each query is paired with both its direct answer and the contextual content containing that answer. This approach enables the model to improve the retrieval process by understanding both direct responses and relevant information within the context. The model's performance was evaluated using MTEB benchmark tests as well as examples drawn from three different independent datasets. Experimental results indicate a significant improvement in the retrieval performance of the fine-tuned model. Notably, a substantial increase was observed in the R@1 metric, which measures the rate at which the best answer is ranked first, along with significant enhancements in MTEB results, demonstrating improved retrieval accuracy.