Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

arXiv cs.RO / 4/15/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • Habitat-GS is a navigation-focused embodied AI simulator built on Habitat-Sim that swaps mesh-based rendering for dynamic 3D Gaussian Splatting (3DGS) to improve visual realism for training agents.
  • The system adds a “gaussian avatar” module for dynamic human modeling, where each avatar functions both as a photorealistic visual presence and as an explicit navigation obstacle to teach human-aware behaviors.
  • It maintains full compatibility with the Habitat ecosystem and supports scalable 3DGS asset import from diverse sources to ease scene creation and reuse.
  • Experiments on point-goal navigation indicate that training with 3DGS scenes improves cross-domain generalization, and that mixed-domain training is the most effective approach.
  • Benchmarks show Habitat-GS can scale across different scene complexities and avatar counts while keeping performance suitable for simulation-based learning.

Abstract

Training embodied AI agents depends critically on the visual fidelity of simulation environments and the ability to model dynamic humans. Current simulators rely on mesh-based rasterization with limited visual realism, and their support for dynamic human avatars, where available, is constrained to mesh representations, hindering agent generalization to human-populated real-world scenarios. We present Habitat-GS, a navigation-centric embodied AI simulator extended from Habitat-Sim that integrates 3D Gaussian Splatting scene rendering and drivable gaussian avatars while maintaining full compatibility with the Habitat ecosystem. Our system implements a 3DGS renderer for real-time photorealistic rendering and supports scalable 3DGS asset import from diverse sources. For dynamic human modeling, we introduce a gaussian avatar module that enables each avatar to simultaneously serve as a photorealistic visual entity and an effective navigation obstacle, allowing agents to learn human-aware behaviors in realistic settings. Experiments on point-goal navigation demonstrate that agents trained on 3DGS scenes achieve stronger cross-domain generalization, with mixed-domain training being the most effective strategy. Evaluations on avatar-aware navigation further confirm that gaussian avatars enable effective human-aware navigation. Finally, performance benchmarks validate the system's scalability across varying scene complexity and avatar counts.