SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation
arXiv cs.RO / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- SpatialAnt is proposed as a zero-shot vision-and-language navigation framework that targets real-world failure modes of existing methods that depend on high-quality human-crafted scene reconstructions.
- The approach improves monocular-based self-reconstruction by adding a physical grounding strategy to recover absolute metric scale, reducing scale ambiguity in learned priors.
- Instead of treating noisy self-reconstructed scenes as reliable spatial references, SpatialAnt uses visual anticipation to render future observations from noisy point clouds and perform counterfactual reasoning to reject paths that conflict with instructions.
- Experiments on both simulated and real-world settings show substantial gains over prior zero-shot methods, reaching 66% Success Rate on R2R-CE and 50.8% on RxR-CE.
- A physical deployment on a Hello Robot validates practical effectiveness with a reported 52% Success Rate in challenging real-world environments.



