IEEE Geoscience and Remote Sensing Letters, cilt.23, 2026 (SCI-Expanded, Scopus)
Remote sensing change captioning describes land surface changes between bi-temporal images. However, models trained on RGB inputs often underperform compared to multispectral (MS) counterparts due to limited spectral awareness. We present X-Change, an RGB-inspired spectral-aware framework that achieves MS-level descriptive quality while jointly performing multi-task segmentation to predict change, NDVI, and NDWI masks—indicating where and how change occurs. Unlike prior RGB-based methods, X-Change employs rule-based spectral supervision from bi-temporal Sentinel-2 data, enabling its shared encoder to internalize NDVI/NDWI-related cues for both captioning and segmentation tasks. Experiments on the MOSAIC-SEN2-CC dataset show that X-Change surpasses state-of- the-art RGB-based models and matches or slightly exceeds those trained on multispectral inputs, producing spatially consistent change and index maps. Overall, X-Change bridges the gap between RGB and MS modalities, offering an interpretable and practical framework for spectral-aware, multi-task, and explainable change captioning. The codebase will be publicly released at https://github.com/ChangeCapsInRS/X-Change.