33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Türkiye, 25 - 28 Haziran 2025, (Tam Metin Bildiri)
In this study, the multilingual embedding model intfloat/multilingual-e5-large-instruct was fine-tuned for Turkish retrieval tasks using the multi-positive sampling approach. In traditional fine-tuning processes, each query is typically associated with only a single correct answer (positive sample). However, in this study, a dataset was constructed where each query is paired with both its direct answer and the contextual content containing that answer. This approach enables the model to improve the retrieval process by understanding both direct responses and relevant information within the context. The model's performance was evaluated using MTEB benchmark tests as well as examples drawn from three different independent datasets. Experimental results indicate a significant improvement in the retrieval performance of the fine-tuned model. Notably, a substantial increase was observed in the R@1 metric, which measures the rate at which the best answer is ranked first, along with significant enhancements in MTEB results, demonstrating improved retrieval accuracy.