Hi-LabSpermTracking: A Novel and High-Quality Sperm Tracking Dataset with an Advanced Ensemble Detection and Tracking Approach for Real-World Clinical Scenarios


AKTAŞ A., SERBES G., UZUN H., Yigit M. H., Aydın N., İLHAN H. O.

Advanced Intelligent Systems, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1002/aisy.202500115
  • Dergi Adı: Advanced Intelligent Systems
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: dataset benchmark, deep learning, infertility, sperm detection and tracking
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Sperm motility, a critical factor in diagnosing male infertility, requires computer-based solutions due to the limitations of manual evaluation methods. This study introduces the Hi-LabSpermTracking dataset, comprising 66 videos (60 s each, 10 fps) collected from 14 patients and meticulously annotated by experts. Unlike similar datasets, these uninterrupted, long-duration videos enable continuous tracking of individual sperm cells, each assigned a unique ID throughout the video, supporting both sperm detection and tracking tasks. Experimental evaluations employ you only look once v8 (YOLOv8), real-time detection transformer, and simple online and realtime tracking with a deep association metric across three scenarios. In Scenario I (sperm detection), the YOLOv8n model achieves 98.9% mAP50 and 97.9% F1-score. In Scenario II (sperm tracking), performance metrics include 83.88% mAP50, 87.63% F1-score, 72.27% higher order tracking accuracy (HOTA), and 77.88% multiple object tracking accuracy (MOTA). Scenario III simulates real-world challenges by separating training and testing videos. Ensemble methods are applied, with the proposed mean ensemble achieving superior results: 86.55% mAP50, 87.87% F1-score, 66.66% HOTA, and 76.42% MOTA. The Hi-LabSpermTracking dataset enables robust sperm tracking research, while the mean ensemble method amplifies accuracy by uniting model strengths.