A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment

TUTAY, Osman; İLHAN, Hamza; UZUN, HAKKI; HÜNER YİĞİT, MERVE; SERBES, Görkem

doi:10.3390/diagnostics16081230

A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment

TUTAY O. E., İLHAN H. O., UZUN H., HÜNER YİĞİT M., SERBES G.

Diagnostics, cilt.16, sa.8, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 8
Basım Tarihi: 2026
Doi Numarası: 10.3390/diagnostics16081230
Dergi Adı: Diagnostics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, EMBASE, Directory of Open Access Journals
Anahtar Kelimeler: class imbalance, infertility, knowledge distillation, multi teacher learning, sperm morphology classification
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Background/Objectives: The manual analysis of sperm morphology, crucial for male infertility diagnosis, is subjective and time-consuming. Automated methods using deep learning, offer a promising alternative; however, standard deep models are prone to overfitting when applied to small, heavily unbalanced clinical datasets, limiting their generalization capability. This study proposes a knowledge distillation approach that functions as a strong regularizer, improving the robustness of automated sperm morphology analysis. Methods: We utilize soft distillation to transfer knowledge from a set of high-capacity teacher models to a smaller student model (SwinV2-base). The teacher architectures include SwinV2-large, EfficientNetV2-m, and ConvNeXtV2-large. To maximize performance, we investigated two distillation strategies: a single-teacher approach, where the student learns from one specific architecture, and a multi-teacher approach, where the student learns from an averaged response of multiple teachers. The models were trained on the imbalanced Hi-LabSpermMorpho dataset, which comprises 18 different sperm morphology categories derived from three differently stained (BesLab, Histoplus, GBL) sample sets. We adopted a cross-dataset training approach in which the teacher models were fine-tuned using the combination of two stained datasets, and the student model was trained on the third, distinct stained dataset. The global loss function combined cross-entropy loss with Kullback–Leibler divergence, employing the teacher’s soft probabilities to prevent the student from over-confidence. Results: The experimental results demonstrate that the student model trained in a multi-teacher setup with augmentation and soft distillation attains higher accuracies (70.94% on BesLab, 73.61% on Histoplus, 71.63% on GBL) than the baseline models. Conclusions: This approach mitigates challenges associated with data scarcity and heavily unbalanced sperm morphology datasets, providing consistent improvements and offering a highly generalizable solution for clinical diagnostics.