Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer

arXiv cs.RO / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses high-fidelity 3D mesh reconstruction in large indoor spaces (e.g., cultural buildings), where LiDAR-inertial odometry suffers from point sparsity, drift, and fixed fusion parameters that cause holes, over-smoothing, and spurious surfaces.
  • It proposes a modular, incremental pipeline that performs frame-by-frame direct label transfer by using a vision foundation model to label RGB frames, then projecting and fusing those labels onto a LiDAR-inertial odometry map.
  • An incremental semantics-aware TSDF fusion step is used to generate the final mesh (via marching cubes), aiming to preserve LiDAR geometric accuracy while resolving boundary ambiguities.
  • Experiments on the Oxford Spires dataset show improved geometric metrics compared with state-of-the-art geometric baselines (ImMesh, Voxblox), and additional qualitative results are provided on the NTU VIRAL dataset.
  • The authors argue the output semantically labeled meshes can facilitate downstream USD asset creation for XR/digital modeling workflows.

Abstract

Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.