MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing
arXiv cs.CV / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- MM-OVSeg is introduced as a multimodal Optical-SAR fusion framework designed for resilient open-vocabulary segmentation in remote sensing, capable of operating under cloudy or haze-contaminated conditions.
- The method features a cross-modal unification process to align representations across sensors and a dual-encoder fusion module that integrates hierarchical features from multiple vision foundation models for text-aligned segmentation.
- Extensive experiments show improved robustness and generalization across diverse cloud conditions, addressing the cross-modal domain gap and dense prediction challenges of current vision-language models.
- The framework leverages optical imagery for rich spectral semantics while exploiting SAR's cloud-penetrating structural cues, and the authors release the source dataset and code.




