Robust Change Captioning in Remote Sensing: SECOND-CC Dataset and MModalCC Framework


Karaca A. C., Ozelbas E., Berber S., Karimli O., Yıldırım T., Amasyalı M. F.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1109/jstars.2025.3600613
  • Dergi Adı: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Aquatic Science & Fisheries Abstracts (ASFA), Compendex, Geobase, INSPEC, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: Change captioning, multimodal change captioning, remote sensing images
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Existing remote sensing change captioning (RSICC) methods often fail under challenges like illumination differences, viewpoint changes, and blur effects, leading to inaccuracies, especially in no-change regions. Moreover, images acquired at different spatial resolutions and with registration errors tend to affect the captions. To address these issues, we introduce SECOND-CC, a novel RSICC dataset featuring high-resolution RGB image pairs, semantic segmentation maps, and diverse realworld scenarios. SECOND-CC contains 6 041 pairs of bitemporal remote sensing images and 30 205 sentences describing the differences between the images. Additionally, we propose MModalCC, a multimodal framework that integrates semantic and visual data using advanced attention mechanisms, including Cross- Modal Cross Attention and Multimodal Gated Cross Attention. In addition, we adapt MModalCC to handle noisy semantic inputs by integrating a Semantic Change Detector, improving its robustness for real-world applications. Detailed ablation studies and attention visualizations further demonstrate its effectiveness and ability to address the challenges of RSICC. Comprehensive experiments show that MModalCC outperforms state-of-the-art RSICC methods, including RSICCformer, Chg2Cap, and PSNet with +4.6% improvement on BLEU4 score and +9.6% improvement on CIDEr score in SECOND-CC dataset. MModalCC was further validated on the LEVIR-MCI benchmark, where it achieved an average S*m score of 83.51, significantly outperforming previous state-of-the-art methods. We will make our dataset and codebase publicly available to facilitate future research at https://github.com/ChangeCapsInRS/SecondCC.