U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces U4D, an uncertainty-aware framework for modeling dynamic 4D LiDAR scenes that addresses the limitation of generative methods treating all regions as equally certain.
U4D estimates spatial uncertainty maps from a pretrained segmentation model to identify semantically challenging (high-entropy) regions before generation.
It generates in a “hard-to-easy” two-stage pipeline: first reconstructing uncertain regions with fine geometric fidelity, then completing remaining areas using uncertainty-conditioned synthesis guided by learned structural priors.
To improve temporal stability, U4D uses a mixture of spatio-temporal (MoST) diffusion block that adaptively fuses spatial and temporal representations.
Experiments report that U4D yields geometrically faithful and temporally consistent LiDAR sequences, aiming to improve the reliability of autonomous driving perception and simulation.

Abstract

Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.