MACHINE VISION AND APPLICATIONS, cilt.30, sa.1, ss.23-40, 2019 (SCI-Expanded)
The primary motivation of computer vision in the robotics field is to obtain a perception level that is as close as possible to human visual system. To achieve this, the inclusion of large datasets is necessary, sometimes involving less-frequent and seemingly irrelevant data to increase the system robustness. To minimize the effort and time in forming such extensive datasets from real world, the preferred method is to utilize simulation environments, replicating real-world conditions as much as possible. Following this solution path, the machine vision problems in robotics (i.e., object detection, recognition, and manipulation) often employ synthetic images in datasets and, however, do not mix them with real-world images. When the systems are trained only using the synthetic images and tested within the simulated world, the tasks requiring object recognition in robotics can be accomplished. However, the systems trained using this procedure cannot be directly used in the real-world experiments or end-user products due to the inconsistencies between real and simulation environments. Therefore, we propose a hybrid image dataset including annotated desktop objects from real and synthetic worlds (ADORESet). This hybrid dataset provides purposeful object categories with a sufficient number of real and synthetic images. ADORESet is composed of colored images with the dimension of 300x300 pixels within 30 categories. Each class has 2500 real-world images acquired from the wild web and 750 synthetic images that are generated within Gazebo simulation environment. This hybrid dataset enables researchers to implement their own algorithms for both real-world and simulation environment conditions. ADORESet is composed of fully annotated object images. The limits of objects are manually specified, and the bounding box coordinates are provided. The successor objects are also labeled to give statistical information and the likelihood about the relations of the objects within the dataset. To further demonstrate the benefits of this dataset, it is tested in object recognition tasks by fine-tuning the state-of-the-art deep convolutional neural networks such as VGGNet, InceptionV3, ResNet, and Xception. The possible combinations regarding the data types for these models are compared in terms of time, accuracy, and loss values. As a result of the conducted object recognition experiments, training with all-real images yields approximately 49% validation accuracy for simulation images. When the training is performed with all-synthetic images and validated using all-real images, the accuracy becomes lower than 10%. If the complete ADORESet is employed for training and validation, the hybrid dataset validation accuracy reaches approximately to 95%. This result proves further that including the real and synthetic images together in the training and validation sessions increases the overall system accuracy and reliability.