IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, cilt.18, ss.25410-25426, 2025 (SCI-Expanded)
Remote Sensing Image Change Captioning (RSICC) aims to generate descriptive sentences that effectively characterize the changes between bitemporal images. Although the state-of-the-art methods focus on predicting captions from RGB image pairs, change captioning in multispectral images has not been investigated yet. For this purpose, we created a new MOSAIC-SEN2-CC dataset, which contains 5232 pairs of multispectral (MS) images captured from Sentinel-2 satellites and 26 160 change captions over a 12-month period. Our dataset consists of a total of eight categories, namely Wildfire (WF), Flood (FL), Wetland (WET), Green Field (GF), Glacier (GL), Urban (UR), Agriculture (AG), along with a No-Change (NO) category. In this article, we propose a Multispectral Image Change Captioning framework that consists of BigEarthNet Feature Extractor, Feature Enhancement, and Transformer-Based Decoder modules to effectively benefit from spectral band information. Specifically, the state-of-the-art methods, such as RSICCformer, Chg2Cap, and PSNet, are adapted to work with BigEarthNet models using ten spectral band images. Detailed comparisons that include attention visualizations, RGB versus MS tradeoffs, change captions, and performance metrics further demonstrate its effectiveness and ability to address RSICC challenges.