IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025 (SCI-Expanded)
Remote Sensing Image Change Captioning (RSICC) aims to generate descriptive sentences that effectively characterize the changes between bi-temporal images. Although the state-of-the-art methods focus on predicting captions from RGB image pairs, change captioning in multispectral images has not been investigated yet. For this purpose, we created a new MOSAIC-SEN2-CC dataset, which contains 5 232 pairs of multispectral (MS) images captured from Sentinel-2 satellites and 26 160 change captions over a 12-month period. Our dataset consists of a total of eight categories, namely Wildfire (WF), Flood (FL), Wetland (WET), Green Field (GF), Glacier (GL), Urban (UR), Agriculture (AG), along with a No-Change (NO) category. In this paper, we propose a Multispectral Image Change Captioning (MSICC) framework that consists of BigEarthNet Feature Extractor, Feature Enhancement, and Transformer-Based Decoder modules to effectively benefit from spectral band information. Specifically, the state-of-the-art methods, such as RSICCformer, Chg2Cap and PSNet, are adapted to work with BigEarthNet models using 10 spectral band images. Detailed comparisons that include attention visualizations, RGB versus MS trade-offs, change captions, and performance metrics further demonstrate its effectiveness and ability to address RSICC challenges. We will make our dataset and codebase publicly available to facilitate future research at https://github.com/ChangeCapsInRS/MOSAIC-SEN2-CC.