CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

arXiv cs.CV / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • CRFT (Consistent-Recurrent Feature Flow Transformer) 提案は、特徴フロー学習に基づく統一的な粗視〜精密な枠組みで、クロスモーダル画像登録を頑健に行うことを目的にしています。
  • Transformerベースで、モダリティ非依存の特徴フロー表現を学習しつつ、特徴のアラインメントとフロー推定を同時に実行します。
  • 粗段階ではマルチスケール相関で大域対応を作り、精段階では階層的特徴融合と適応的な空間推論で局所の対応を精緻化します。
  • 幾何適応性のために、反復的な不一致ガイド付き注意とSpatial Geometric Transform (SGT) を用いた再帰機構でフローフィールドを段階的に修正し、特徴レベルの整合性を強めます。
  • 多様なクロスモーダル・データセットで既存手法を精度・頑健性の両面で上回ったとされ、遠隔センシング、自動運転、医療画像などへの汎用応用とコード公開が示されています。

Abstract

We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on feature flow learning for robust cross-modal image registration. CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation. The coarse stage establishes global correspondences through multi-scale feature correlation, while the fine stage refines local details via hierarchical feature fusion and adaptive spatial reasoning. To enhance geometric adaptability, an iterative discrepancy-guided attention mechanism with a Spatial Geometric Transform (SGT) recurrently refines the flow field, progressively capturing subtle spatial inconsistencies and enforcing feature-level consistency. This design enables accurate alignment under large affine and scale variations while maintaining structural coherence across modalities. Extensive experiments on diverse cross-modal datasets demonstrate that CRFT consistently outperforms state-of-the-art registration methods in both accuracy and robustness. Beyond registration, CRFT provides a generalizable paradigm for multimodal spatial correspondence, offering broad applicability to remote sensing, autonomous navigation, and medical imaging. Code and datasets are publicly available at https://github.com/NEU-Liuxuecong/CRFT.