Detection of sperm cells by single-stage and two-stage deep object detectors


Yüzkat M., Ilhan H. O., Aydin N.

Biomedical Signal Processing and Control, cilt.83, 2023 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 83
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.bspc.2023.104630
  • Dergi Adı: Biomedical Signal Processing and Control
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, INSPEC
  • Anahtar Kelimeler: Infertility, Single-stage object detection, Sperm detection, Two-stage object detection, Video processing
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

© 2023 Elsevier LtdToday, infertility is a common health concern that affects approximately 15%–20% of the world population. The evaluation of male patients with infertility includes diverse examination and laboratory tests in comparison to female patients. In the evaluation of male infertility, sperm specimens are examined in terms of morphometry, concentration, and motility. Detection of sperm is the critical step to determine the concentration and motility parameters. In this study, a fusion approach of deep learning-based object detection techniques is utilized in the sperm detection problem for obtaining more accurate and consistent concentration and motility characteristics of sperm specimens. First, 12 sperm specimen videos acquired from different infertile patients were recorded. Then, regions in the video frames were labeled as sperm and non-sperm by the experts. Two different scenarios have been tested over the labeled dataset. Differently arranged train and test data ratios were also utilized for each scenario. In the first scenario, deep learning-based object detection algorithms were individually performed for the sperm detection of patient-oriented videos in terms of different train/test split ratios. Videos with the lowest mAP (mean Average Precision) values resulted in the first scenario were selected for the second scenario as target videos. The rest of the videos were used for model training without any patient-oriented manner. In the second scenario, the detection performance for more challenging videos is aimed to increase by using different videos instead of using the small part of the same video in the model training. Additionally, a fusion idea of the utilized deep learning-based techniques was proposed and performed over these low mAP resulted videos to increase the detection performance. The results are compared regarding the general mAP, class-wise APs, and training time. In the first scenario, YOLOv5 achieved the best results, while the proposed fusion approach achieved the best mAP scores in the second scenario.