MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation
arXiv cs.RO / 5/4/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- MiniVLA-Nav v1 is a new simulation dataset for language-conditioned robot navigation under the Language-Conditioned Object Approach (LCOA) framework.
- It tasks an NVIDIA Nova Carter differential-drive robot to reach and stop at a referenced object, using natural-language instructions across four photorealistic Isaac Sim environments (Office, Hospital, Full Warehouse, and Multiple Shelves).
- The dataset contains 1,174 episodes with synchronized 640x640 RGB images, metric depth maps (float32 in metres), and instance segmentation masks, plus continuous (v, omega) and tokenized expert action labels recorded at 60 Hz.
- It includes structured trajectory diversity via three spawn-distance tiers, along with multiple object categories, instruction templates, paraphrase OOD templates, and five evaluation splits for robustness and out-of-distribution testing.
- MiniVLA-Nav v1 is publicly released on Hugging Face for researchers and developers to train and benchmark navigation policies.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA