IUP-Pose: Decoupled Iterative Uncertainty Propagation for Real-time Relative Pose Regression via Implicit Dense Alignment v1

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The article identifies key bottlenecks in Relative Pose Regression (RPR), notably the rotation-translation coupling and insufficient cross-view feature alignment that limit real-time performance.
  • It proposes IUP-Pose, a geometry-driven decoupled iterative framework with implicit dense alignment and a lightweight Multi-Head Bi-Cross Attention module to align cross-view features without explicit RANSAC supervision.
  • The method employs a decoupled rotation-translation pipeline with two shared-parameter rotation stages that iteratively refine rotation under uncertainty, followed by feature realignment via rotational homography H_inf before translation prediction.
  • It reports strong results on MegaDepth1500 (73.3% AUC@20deg) with 70 FPS throughput and 37M parameters, indicating a favorable accuracy-efficiency trade-off for real-time edge deployment.

Abstract

Relative pose estimation is fundamental for SLAM, visual localization, and 3D reconstruction. Existing Relative Pose Regression (RPR) methods face a key trade-off: feature-matching pipelines achieve high accuracy but block gradient flow via non-differentiable RANSAC, while ViT-based regressors are end-to-end trainable but prohibitively expensive for real-time deployment. We identify the core bottlenecks as the coupling between rotation and translation estimation and insufficient cross-view feature alignment. We propose IUP-Pose, a geometry-driven decoupled iterative framework with implicit dense alignment. A lightweight Multi-Head Bi-Cross Attention (MHBC) module aligns cross-view features without explicit matching supervision. The aligned features are processed by a decoupled rotation-translation pipeline: two shared-parameter rotation stages iteratively refine rotation with uncertainty, and feature maps are realigned via rotational homography H_inf before translation prediction. IUP-Pose achieves 73.3% AUC@20deg on MegaDepth1500 with full end-to-end differentiability, 70 FPS throughput, and only 37M parameters, demonstrating a favorable accuracy-efficiency trade-off for real-time edge deployment.