Disentangle-then-Align: Non-Iterative Hybrid Multimodal Image Registration via Cross-Scale Feature Disentanglement

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces HRNet, a Hybrid Registration Network that combines representation disentanglement with hybrid parameter prediction for multimodal image registration.
It uses a shared backbone with Modality-Specific Batch Normalization and a Cross-scale Disentanglement and Adaptive Projection module to suppress modality-private cues and stabilize the shared feature space.
A Hybrid Parameter Prediction Module performs non-iterative coarse-to-fine estimation of global rigid parameters and deformation fields, yielding a coherent deformation field.
The method achieves state-of-the-art performance on both rigid and non-rigid registration across four multimodal datasets, with the code released on the project website.

Abstract

Multimodal image registration is a fundamental task and a prerequisite for downstream cross-modal analysis. Despite recent progress in shared feature extraction and multi-scale architectures, two key limitations remain. First, some methods use disentanglement to learn shared features but mainly regularize the shared part, allowing modality-private cues to leak into the shared space. Second, most multi-scale frameworks support only a single transformation type, limiting their applicability when global misalignment and local deformation coexist. To address these issues, we formulate hybrid multimodal registration as jointly learning a stable shared feature space and a unified hybrid transformation. Based on this view, we propose HRNet, a Hybrid Registration Network that couples representation disentanglement with hybrid parameter prediction. A shared backbone with Modality-Specific Batch Normalization (MSBN) extracts multi-scale features, while a Cross-scale Disentanglement and Adaptive Projection (CDAP) module suppresses modality-private cues and projects shared features into a stable subspace for matching. Built on this shared space, a Hybrid Parameter Prediction Module (HPPM) performs non-iterative coarse-to-fine estimation of global rigid parameters and deformation fields, which are fused into a coherent deformation field. Extensive experiments on four multimodal datasets demonstrate state-of-the-art performance on rigid and non-rigid registration tasks. The code is available at the project website.

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

Disentangle-then-Align: Non-Iterative Hybrid Multimodal Image Registration via Cross-Scale Feature Disentanglement

Key Points

Abstract

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer