A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment


Creative Commons License

TUTAY O. E., İLHAN H. O., UZUN H., HÜNER YİĞİT M., SERBES G.

Diagnostics, vol.16, no.8, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 16 Issue: 8
  • Publication Date: 2026
  • Doi Number: 10.3390/diagnostics16081230
  • Journal Name: Diagnostics
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, EMBASE, Directory of Open Access Journals
  • Keywords: class imbalance, infertility, knowledge distillation, multi teacher learning, sperm morphology classification
  • Open Archive Collection: AVESIS Open Access Collection
  • Yıldız Technical University Affiliated: Yes

Abstract

Background/Objectives: The manual analysis of sperm morphology, crucial for male infertility diagnosis, is subjective and time-consuming. Automated methods using deep learning, offer a promising alternative; however, standard deep models are prone to overfitting when applied to small, heavily unbalanced clinical datasets, limiting their generalization capability. This study proposes a knowledge distillation approach that functions as a strong regularizer, improving the robustness of automated sperm morphology analysis. Methods: We utilize soft distillation to transfer knowledge from a set of high-capacity teacher models to a smaller student model (SwinV2-base). The teacher architectures include SwinV2-large, EfficientNetV2-m, and ConvNeXtV2-large. To maximize performance, we investigated two distillation strategies: a single-teacher approach, where the student learns from one specific architecture, and a multi-teacher approach, where the student learns from an averaged response of multiple teachers. The models were trained on the imbalanced Hi-LabSpermMorpho dataset, which comprises 18 different sperm morphology categories derived from three differently stained (BesLab, Histoplus, GBL) sample sets. We adopted a cross-dataset training approach in which the teacher models were fine-tuned using the combination of two stained datasets, and the student model was trained on the third, distinct stained dataset. The global loss function combined cross-entropy loss with Kullback–Leibler divergence, employing the teacher’s soft probabilities to prevent the student from over-confidence. Results: The experimental results demonstrate that the student model trained in a multi-teacher setup with augmentation and soft distillation attains higher accuracies (70.94% on BesLab, 73.61% on Histoplus, 71.63% on GBL) than the baseline models. Conclusions: This approach mitigates challenges associated with data scarcity and heavily unbalanced sperm morphology datasets, providing consistent improvements and offering a highly generalizable solution for clinical diagnostics.