Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
arXiv cs.CV / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a key bottleneck in high-resolution remote sensing mapping: directly fusing global geospatial foundation-model embeddings with local high-resolution features can cause feature interference and degrade spatial structure due to a semantic–spatial gap.
- It proposes a Structure-Semantic Decoupled Modulation (SSDM) framework that splits global representations into two cross-modal injection pathways: a structural prior modulation branch and a global semantic injection branch.
- The structural prior branch injects macroscopic receptive-field priors into self-attention layers of the high-resolution encoder to reduce fragmentation and stabilize local feature extraction under high-frequency noise and intra-class variance.
- The semantic injection branch aligns holistic context with the deep high-resolution feature space and supplements global semantics through cross-modal integration to improve semantic consistency and category discrimination.
- Experiments on remote sensing tasks show SSDM achieves state-of-the-art results over existing cross-modal fusion approaches and improves mapping accuracy across multiple scenarios.



