Learning Humanoid Navigation from Human Data

arXiv cs.RO / 4/2/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The approach combines a diffusion model for predicting plausible future trajectory distributions, a 360° visual memory that fuses color, depth, and semantic information, and appearance features extracted with a frozen DINOv3 backbone to capture cues depth sensors may miss.

Abstract

We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to depth sensors. A hybrid sampling scheme achieves real-time inference in 10 denoising steps, and a receding-horizon controller selects paths from the predicted distribution. We validate EgoNav through offline evaluations, where it outperforms baselines in collision avoidance and multi-modal coverage, and through zero-shot deployment on a Unitree G1 humanoid across unseen indoor and outdoor environments. Behaviors such as waiting for doors to open, navigating around crowds, and avoiding glass walls emerge naturally from the learned prior. We will release the dataset and trained models. Our website: https://egonav.weizhuowang.com