Keep It CALM: Toward Calibration-Free Kilometer-Level SLAM with Visual Geometry Foundation Models via an Assistant Eye

arXiv cs.RO / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that using only a single linear transform (e.g., Sim3/SL4) to align sub-maps is insufficient for kilometer-level SLAM with Visual Geometry Foundation Models (VGFMs) because VGFM outputs can include complex non-linear geometric distortions.
  • It proposes CAL2M (Calibration-free Assistant-eye based Large-scale Localization and Mapping), a plug-and-play SLAM framework that removes scale ambiguity using an “assistant eye” prior of constant physical spacing, without temporal or spatial pre-calibration.
  • CAL2M includes an epipolar-guided intrinsic-and-pose correction model that uses feature matching and online intrinsic search to decompose the fundamental matrix and fix rotation/translation errors stemming from inaccurate intrinsics.
  • To prevent drift and divergence, the method introduces globally consistent mapping via anchor propagation, fusing anchors along the trajectory to enable nonlinear elastic alignment of sub-maps while maintaining global consistency.
  • The authors state that the CAL2M source code will be released publicly on GitHub.

Abstract

Visual Geometry Foundation Models (VGFMs) demonstrate remarkable zero-shot capabilities in local reconstruction. However, deploying them for kilometer-level Simultaneous Localization and Mapping (SLAM) remains challenging. In such scenarios, current approaches mainly rely on linear transforms (e.g., Sim3 and SL4) for sub-map alignment, while we argue that a single linear transform is fundamentally insufficient to model the complex, non-linear geometric distortions inherent in VGFM outputs. Forcing such rigid alignment leads to the rapid accumulation of uncorrected residuals, eventually resulting in significant trajectory drift and map divergence. To address these limitations, we present CAL2M (Calibration-free Assistant-eye based Large-scale Localization and Mapping), a plug-and-play framework compatible with arbitrary VGFMs. Distinct from traditional systems, CAL2M introduces an "assistant eye" solely to leverage the prior of constant physical spacing, effectively eliminating scale ambiguity without any temporal or spatial pre-calibration. Furthermore, leveraging the assumption of accurate feature matching, we propose an epipolar-guided intrinsic and pose correction model. Supported by an online intrinsic search module, it can effectively rectify rotation and translation errors caused by inaccurate intrinsics through fundamental matrix decomposition. Finally, to ensure accurate mapping, we introduce a globally consistent mapping strategy based on anchor propagation. By constructing and fusing anchors across the trajectory, we establish a direct local-to-global mapping relationship. This enables the application of nonlinear transformations to elastically align sub-maps, effectively eliminating geometric misalignments and ensuring a globally consistent reconstruction. The source code of CAL2M will be publicly available at https://github.com/IRMVLab/CALM.