29th IEEE Conference on Signal Processing and Communications Applications, SIU 2021, Virtual, Istanbul, Türkiye, 9 - 11 Haziran 2021
© 2021 IEEE.In this paper, a deep learning based approach to dataset cleaning has been explained by utilizing Google Landmark Challenge dataset v2, which is a noisy dataset that consists of images photographed by people. Study also includes improvements on the concept of image classification and recognition for large noisy datasets mainly revolving around the approach to dataset cleaning which are also listed. Results achieved using this approach has been detailed with both quantitative methods like graphs regarding the reduction of classes and number of eliminated noisy images and qualitative methods like visual analysis of said images. In conclusion, it is observed that using confidence score outputs of a deep learning network, it is possible to remove noisy samples from a dataset. This paper also includes the specific threshold values for the achieved results on this dataset using explained model architecture for better reproducibility.