BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

arXiv cs.CV / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces BEVCALIB, a new approach for LiDAR–camera calibration that uses bird’s-eye view (BEV) features derived directly from raw data.
  • It extracts camera BEV features and LiDAR BEV features separately, then fuses them into a shared BEV feature space to learn the cross-modal transformation.
  • A geometry-guided feature selector is proposed to pick the most important features in the transformation decoder, lowering memory usage and improving training efficiency.
  • Experiments on KITTI, NuScenes, and a proprietary dataset show state-of-the-art performance, with large gains in translation and rotation accuracy under noise.
  • The authors report substantial improvements over open-source reproducible baselines (up to an order of magnitude) and provide code and demo materials online.

Abstract

Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.