TR-MMLU Benchmark for Large Language Models: Performance Evaluation, Challenges, and Opportunities for Improvement B y k Dil Modelleri i in TR-MMLU Benchmark i: Performans Degerlendirmesi, Zorluklar ve Iyile stirme Firsatlari


Bayram M. A., Arda Fincan A., Gumus A. S., DİRİ B., Yildirim S., Aytas O.

33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025, İstanbul, Turkey, 25 - 28 June 2025, (Full Text) identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu66497.2025.11112154
  • City: İstanbul
  • Country: Turkey
  • Keywords: Artificial Intelligence, Large Language Models (LLM), Natural Language Processing (NLP), Turkish NLP
  • Yıldız Technical University Affiliated: Yes

Abstract

Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.