Applied Sciences (Switzerland), cilt.15, sa.20, 2025 (SCI-Expanded, Scopus)
The inspection and maintenance of urban sewer infrastructure remain critical challenges for megacities, where conventional manual inspection approaches are labor-intensive, time-consuming, and prone to human error. Although deep learning has been increasingly applied to sewer inspection, the field lacks both a publicly available large-scale dataset and a systematic evaluation of CNN and transformer-based models on real sewer footage. The primary aim of this study is to systematically evaluate and compare state-of-the-art deep learning architectures for automated sewer defect detection using a newly introduced dataset. We present the Istanbul Sewer Defect Dataset (ISWDS), comprising 13,491 expert-annotated images collected from Istanbul’s wastewater network and covering eight defect categories that account for approximately 90% of reported failures. The scientific novelty of this work lies in both the introduction of the ISWDS and the first systematic benchmarking of YOLO (v8/11/12) and RT-DETR (v1/v2) architectures under identical protocols on real sewer inspection footage. Experimental results demonstrate that RT-DETR v2 achieves the best performance (F1: 79.03%, Recall: 81.10%), significantly outperforming the best YOLO variant. While transformer-based architectures excel in detecting partially occluded defects and complex operational conditions, YOLO models provide computational efficiency advantages for resource-constrained deployments. Furthermore, a QGIS-based inspection tool integrating the best-performing models was developed to enable real-time video analysis and automated reporting. Overall, this study highlights the trade-offs between accuracy and efficiency, demonstrating that RT-DETR v2 is most suitable for server-based processing. In contrast, compact YOLO variants are more appropriate for edge deployment.