CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

arXiv cs.RO / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces CAVERS, a multimodal dataset designed specifically for autonomous robot perception and navigation in natural karstic caves, where conditions differ significantly from mines or tunnels.
  • CAVERS includes 24 sequences (about 335 GB) collected in two structurally distinct rooms at Cueva de la Victoria in Spain, using RGB-D, near-IR thermal, and LiDAR sensors in both handheld and rover-mounted setups.
  • Most sequences come with millimeter-accurate 6-DoF ground-truth pose and velocity at 120 Hz provided by an OptiTrack motion capture system installed inside the cave.
  • The authors benchmark seven state-of-the-art SLAM/odometry methods across multiple sensing modalities (visual, visual-inertial, thermal-inertial, and LiDAR-based) plus a 3D reconstruction pipeline, demonstrating dataset usability.
  • The dataset and supplementary materials are publicly available on GitHub, enabling direct research and benchmarking for cave-SLAM and multimodal robotics.

Abstract

Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflective wet surfaces, near-zero ambient light, and complex branching passages. Yet publicly available datasets targeting this environment remain scarce and offer limited sensing modalities and environmental diversity. We present CAVERS, a multimodal dataset acquired in two structurally distinct rooms of Cueva de la Victoria, M\'alaga, Spain, comprising 24 sequences totaling approximately 335 GB of recorded data. The sensor suite combines an Intel RealSense D435i RGB-D-I camera, an Optris PI640i near-IR thermal camera, and a Velodyne VLP-16 LiDAR, operated both handheld and mounted on a wheeled rover under full darkness and artificial illumination. For most of the sequences, mm-accurate 6-DoF ground truth pose and velocity at 120 Hz are provided by an Optirack motion capture system installed directly inside the cave. We benchmark seven state-of-the-art SLAM and odometry algorithms spanning visual, visual-inertial, thermal-inertial, and LiDAR-based pipelines, as well as a 3D reconstruction pipeline, demonstrating the dataset's usability. %The dataset and all supplementary material are publicly available at: https://github.com/spaceuma/cavers.