Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer

arXiv cs.RO / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses high-fidelity 3D mesh reconstruction in large indoor spaces (e.g., cultural buildings), where LiDAR-inertial odometry suffers from point sparsity, drift, and fixed fusion parameters that cause holes, over-smoothing, and spurious surfaces.
It proposes a modular, incremental pipeline that performs frame-by-frame direct label transfer by using a vision foundation model to label RGB frames, then projecting and fusing those labels onto a LiDAR-inertial odometry map.
An incremental semantics-aware TSDF fusion step is used to generate the final mesh (via marching cubes), aiming to preserve LiDAR geometric accuracy while resolving boundary ambiguities.
Experiments on the Oxford Spires dataset show improved geometric metrics compared with state-of-the-art geometric baselines (ImMesh, Voxblox), and additional qualitative results are provided on the NTU VIRAL dataset.
The authors argue the output semantically labeled meshes can facilitate downstream USD asset creation for XR/digital modeling workflows.

Abstract

Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.