Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tests 24 pretrained image matcher families for cross-modal optical–SAR satellite registration in a strict zero-shot setup (no fine-tuning or SAR-domain adaptation) on SpaceNet9 plus two additional benchmarks.
Results show asymmetric domain transfer: matchers with explicit cross-modal training do not consistently outperform those without it, with top performance around 3.0 px mean error on labeled SpaceNet9 scenes.
RoMa reaches the lowest reported mean error (~3.0 px) without cross-modal training, while XoFTR also performs best, suggesting that foundation-model features (e.g., DINOv2) may partially provide modality invariance.
Protocol and deployment choices strongly affect accuracy: geometry model selection, tile size, and inlier gating can change mean error by as much as 33×, sometimes more than switching matchers.
3D-reconstruction-focused matchers (MASt3R, DUSt3R) are found to be highly sensitive to the evaluation/protocol and remain fragile under default settings, indicating they may not be reliable “out of the box” for traditional 2D registration pipelines.

Abstract

Cross-modal optical-SAR (Synthetic Aperture Radar) registration is a bottleneck for disaster-response via remote sensing, yet modern image matchers are developed and benchmarked almost exclusively on natural-image domains. We evaluate twenty-four pretrained matcher families--in a zero-shot setting with no fine-tuning or domain adaptation on satellite or SAR data--on SpaceNet9 and two additional cross-modal benchmarks under a deterministic protocol with tiled large-image inference, robust geometric filtering, and tie-point-grounded metrics. Our results reveal asymmetric transfer--matchers with explicit cross-modal training do not uniformly outperform those without it. While XoFTR (trained for visible-thermal matching) and RoMa achieve the lowest reported mean error at

3.0

px on the labeled SpaceNet9 training scenes, RoMa achieves this without any cross-modal training, and MatchAnything-ELoFTR (

3.4

px)--trained on synthetic cross-modal pairs--matches closely, suggesting (as a working hypothesis) that foundation-model features (DINOv2) may contribute to modality invariance that partially substitutes for explicit cross-modal supervision. 3D-reconstruction matchers (MASt3R, DUSt3R), which are not designed for traditional 2D image matching, are highly protocol-sensitive and remain fragile under default settings. Deployment protocol choices (geometry model, tile size, inlier gating) shift accuracy by up to

33\times

for a single matcher, sometimes exceeding the effect of swapping matchers entirely within the evaluated sweep--affine geometry alone reduces mean error from

12.34

9.74

px. These findings inform both practical deployment of existing matchers and future matcher design for cross-modal satellite registration.

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

Reddit r/artificial

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

🌱 Green Habit Tracker

Dev.to

Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?

Key Points

Abstract

Related Articles

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

🌱 Green Habit Tracker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer