December
2025

A New Multimodal Change Captioning Dataset and Research Paper from the MOSAIC Research Group

As the MOSAIC Research Group, we are pleased to announce that our new study has been published in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) :

📌Robust Change Captioning in Remote Sensing: SECOND-CC Dataset and MModalCC Framework

🔗 https://ieeexplore.ieee.org/document/11130644


This study focuses on real-world challenges, such as blur, illumination differences, viewpoint changes, and resolution mismatches—where existing remote sensing image change captioning (RSICC) methods often struggle. To address these issues, the MOSAIC Research Group introduces the SECOND-CC dataset, which consists of 6,041 high-resolution RGB image pairs, semantic segmentation maps, and 30,205 human-written captions (five per pair). The dataset includes both change and no-change cases across 28 distinct change categories.

The proposed MModalCC architecture employs Siamese encoders for RGB and semantic maps, along with a decoder built on Cross-Modal Cross Attention (CMCA), Unimodal Difference Cross Attention (UDCA), and Multimodal Gated Cross Attention (MGCA). Additionally, a semantic change detector is integrated to handle noisy semantic inputs.

Experimental results demonstrate that MModalCC achieves significant improvements over RSICCformer, Chg2Cap, and PSNet on the SECOND-CC dataset, and substantially outperforms state-of-the-art methods on a related benchmark, LEVIR-MCI.