Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images

arXiv cs.CV / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces STSF-Net, a multimodal change detection framework for optical–SAR remote sensing that jointly models modality-specific and spatio-temporal common features to improve fine-grained semantic change representation.
  • It uses modality-specific signals to capture true semantic changes while embedding spatio-temporal common features to suppress pseudo-changes caused by differing optical and SAR imaging mechanisms.
  • STSF-Net adds an adaptive optical/SAR feature fusion strategy that reweights features using semantic priors derived from pre-trained foundation models, enabling semantic-guided fusion of multimodal information.
  • The authors present Delta-SN6, described as the first openly accessible multiclass benchmark with VHR fully polarimetric SAR and optical image pairs for optical–SAR MMCD.
  • Experiments on Delta-SN6, BRIGHT, and Wuhan-Het report improvements over state of the art by 3.21%, 1.08%, and 1.32% in mIoU, and the code/dataset are planned for release via the provided GitHub link.

Abstract

Multimodal change detection (MMCD) identifies changed areas in multimodal remote sensing (RS) data, demonstrating significant application value in land use monitoring, disaster assessment, and urban sustainable development. However, literature MMCD approaches exhibit limitations in cross-modal interaction and exploiting modality-specific characteristics. This leads to insufficient modeling of fine-grained change information, thus hindering the precise detection of semantic changes in multimodal data. To address the above problems, we propose STSF-Net, a framework designed for MMCD between optical and SAR images. STSF-Net jointly models modality-specific and spatio-temporal common features to enhance change representations. Specifically, modality-specific features are exploited to capture genuine semantic change signals, while spatio-temporal common features are embedded to suppress pseudo-changes caused by differences in imaging mechanisms. Furthermore, we introduce an optical and SAR feature fusion strategy that adaptively adjusts feature importance based on semantic priors obtained from pre-trained foundational models, enabling semantic-guided adaptive fusion of multi-modal information. In addition, we introduce the Delta-SN6 dataset, the first openly-accessible multiclass MMCD benchmark consisting of very-high-resolution (VHR) fully polarimetric SAR and optical images. Experimental results on Delta-SN6, BRIGHT, and Wuhan-Het datasets demonstrate that our method outperforms the state-of-the-art (SOTA) by 3.21%, 1.08%, and 1.32% in mIoU, respectively. The associated code and Delta-SN6 dataset will be released at: https://github.com/liuxuanguang/STSF-Net.