2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, Bursa, Türkiye, 10 - 12 Eylül 2025, (Tam Metin Bildiri)
Large Language Models (LLMs) play a central role in question answering but are limited by their static training data, lacking explicit references, and being prone to hallucinations. This limitation becomes critical when up-to-date or domainspecific information is needed. Retrieval-Augmented Generation (RAG) systems address this by retrieving relevant information from external sources to enhance generation. Yet, many real-world questions require multiple contexts from diverse sources, while current embedding models are mostly optimized for single-context retrieval and fail to efficiently retrieve all relevant information. Despite the importance of multi-context retrieval, this area is underexplored, particularly for low-resource languages like Turkish, where no dedicated dataset exists. This study introduces Turkuaz-RAG, the first Turkish benchmark for multi-context retrieval. Built from a large news corpus, it contains over 2,500 question-context-answer triplets across five diverse question types, including comparison and temporal reasoning. The dataset supports evaluation of multilingual and Turkish embedding models and retrieval methods tailored for multi-context fusion. Turkuaz-RAG aims to fill a significant research gap in the multi-context retrieval field and provide a valuable resource for future work on embedding, retrieval, and question answering in Turkish.