📌Robust Change Captioning in Remote Sensing: SECOND-CC Dataset and MModalCC Framework
🔗 https://ieeexplore.ieee.org/document/11130644
This study focuses on real-world challenges, such as blur, illumination differences, viewpoint changes, and resolution mismatches—where existing remote sensing image change captioning (RSICC) methods often struggle. To address these issues, the MOSAIC Research Group introduces the SECOND-CC dataset, which consists of 6,041 high-resolution RGB image pairs, semantic segmentation maps, and 30,205 human-written captions (five per pair). The dataset includes both change and no-change cases across 28 distinct change categories.
The proposed MModalCC architecture employs Siamese encoders for RGB and semantic maps, along with a decoder built on Cross-Modal Cross Attention (CMCA), Unimodal Difference Cross Attention (UDCA), and Multimodal Gated Cross Attention (MGCA). Additionally, a semantic change detector is integrated to handle noisy semantic inputs.
Experimental results demonstrate that MModalCC achieves significant improvements over RSICCformer, Chg2Cap, and PSNet on the SECOND-CC dataset, and substantially outperforms state-of-the-art methods on a related benchmark, LEVIR-MCI.