CATNAV: Cached Vision-Language Traversability for Efficient Zero-Shot Robot Navigation
arXiv cs.RO / 3/25/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- CATNAV is a cost-aware, embodiment-aware zero-shot robot navigation framework that uses multimodal LLMs to generate traversability costmaps without task-specific training.
- It introduces visuosemantic caching to reuse prior risk assessments for semantically similar scenes, cutting online vision-language model (VLM) queries by 85.7%.
- CATNAV also includes a VLM-based trajectory selection module that visually reasons over candidate trajectories to pick the safest option while respecting behavioral constraints.
- In experiments with a quadruped robot in both indoor and outdoor unstructured environments, CATNAV outperforms state-of-the-art vision-language-action baselines, achieving a 10-point higher average goal-reaching rate.
- Across five tasks, CATNAV reduces behavioral constraint violations by 33%, indicating improved safety and reliability in real-world-like navigation settings.
Related Articles

Lemonade 10.0.1 improves setup process for using AMD Ryzen AI NPUs on Linux
Reddit r/artificial
The 2026 Developer Showdown: Claude Code vs. Google Antigravity
Dev.to

Google March 2026 Spam Update: SEO Impact and What to Do Now | MKDM
Dev.to
CRM Development That Drives Growth
Dev.to

Karpathy's Autoresearch: Improving Agentic Coding Skills
Dev.to