OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding
arXiv cs.CV / 3/27/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Cross-Domain Remote Sensing Visual Grounding (CD-RSVG) to localize targets using natural language across sensor domains (e.g., optical vs SAR), which prior methods largely could not handle.
- It builds what it claims is the first large-scale benchmark dataset for this setting (OptSAR-RSVG) and evaluates on OptSAR-RSVG and DIOR-RSVG.
- OptiSAR-Net++ is proposed as a transformer-free framework, using patch-level Low-Rank Adaptation Mixture-of-Experts (PL-MoE) to efficiently decouple and model cross-domain features.
- To avoid the computational cost of transformer decoding, the method shifts to a CLIP-style contrastive cross-modal matching approach with dynamic adversarial negative sampling.
- Additional components (text-guided dual-gate fusion and a region-aware auxiliary head) are added to improve semantic-visual alignment and spatial modeling, achieving state-of-the-art localization accuracy and efficiency, with code/data planned for public release.
広告
Related Articles
Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.
Dev.to
The Redline Economy
Dev.to
$500 GPU outperforms Claude Sonnet on coding benchmarks
Dev.to
From Scattershot to Sniper: AI for Hyper-Personalized Media Lists
Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure
Dev.to