Native-Domain Cross-Attention for Camera-LiDAR Extrinsic Calibration Under Large Initial Perturbations

arXiv cs.CV / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses camera–LiDAR extrinsic calibration by improving cross-modal correspondence when the initial extrinsic guess is far from the ground truth.
  • It proposes an extrinsic-aware cross-attention framework that matches image patches with LiDAR point groups in their native domains, avoiding 3D distortion from LiDAR-to-depth-map projection.
  • The method injects extrinsic parameter hypotheses directly into the attention/correspondence modeling to maintain geometry-consistent fusion across modalities.
  • Experiments on KITTI and nuScenes show consistent performance gains over prior state-of-the-art methods in both accuracy and robustness under large perturbations.
  • The authors report calibration success rates of 88% on KITTI and 99% on nuScenes under large extrinsic perturbations and have open-sourced the code at the provided GitHub link.

Abstract

Accurate camera-LiDAR fusion relies on precise extrinsic calibration, which fundamentally depends on establishing reliable cross-modal correspondences under potentially large misalignments. Existing learning-based methods typically project LiDAR points into depth maps for feature fusion, which distorts 3D geometry and degrades performance when the extrinsic initialization is far from the ground truth. To address this issue, we propose an extrinsic-aware cross-attention framework that directly aligns image patches and LiDAR point groups in their native domains. The proposed attention mechanism explicitly injects extrinsic parameter hypotheses into the correspondence modeling process, enabling geometry-consistent cross-modal interaction without relying on projected 2D depth maps. Extensive experiments on the KITTI and nuScenes benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches in both accuracy and robustness. Under large extrinsic perturbations, our approach achieves accurate calibration in 88% of KITTI cases and 99% of nuScenes cases, substantially surpassing the second-best baseline. We have open sourced our code on https://github.com/gitouni/ProjFusion to benefit the community.