Classifying Hyperspectral Images: A Comparison of Vision Transformer and Convolution Based Models Hiperspektral G r nt lerin Siniflandirilmasi: G r D n st r c ve Evri sim Temelli Modellerin Kar sila stirilmasi

Arpaci S. A., BİLGİN G.

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, Bursa, Türkiye, 10 - 12 Eylül 2025, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/asyu67174.2025.11208305
Basıldığı Şehir: Bursa
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: classification, deep neural networks, hyperspectral image, hyperspectral image classification, remote sensing
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In this study, high-dimensional hyperspectral image classification is investigated. Within the scope of the study, MobileNeXt, VAN (Visual Attention Network), CTMixer (Convolution Transformer Mixer), MSSTT (Multiscale Super Token Transformer), CvT (Convolutional vision Transformer), ResNet50, ResNet101, MobileNet, ViT (Vision Transformer), and C3D (Convolutional 3D) models were evaluated using the publicly available Pavia Centre, Salinas, and Indian Pines datasets. During the examination process, high-dimensional datasets were preprocessed before being trained with the models. At this stage, the Principal Component Analysis (PCA) method was applied to reduce the data size, then 3D cubes were obtained from the data and the data were normalized at the end of this stage. The evaluation was performed according to the Overall Accuracy (OA), Average Accuracy (AA), Kappa, Precision, Recall and F1_score. Additionally, the training time of the models was measured. The model performance values obtained as a result of the experiments were analyzed. Accordingly, CTMixer, VAN, and MSSTT models achieved the best results for all datasets, while CvT and ViT models achieved lower success rates than the other models. The MSSTT model provided the advantage of high success rates and a short training time for hyperspectral image classification compared to other models. In addition to comparing the ten related models, this study also provides a separate contribution to the literature on the comparability of the MobileNeXt, VAN, CvT, ResNet50, ResNet101, MobileNet and ViT models (originally designed for optical image classification) and the C3D model (designed for video classification) in hyperspectral image classification.