DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
arXiv cs.CV / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces DriveDreamer-Policy, a geometry-grounded world-action model designed to unify generation (depth and future video) and planning (driving actions) in embodied driving tasks.
- It uses a large language model to integrate language instructions, multi-view images, and actions, then leverages three lightweight generators to produce depth, future video, and action outputs.
- By learning a geometry-aware world representation, the method improves the coherence of imagined futures and leads to more informed driving actions in a single modular architecture.
- Experiments on Navsim v1 and v2 show strong closed-loop planning and world generation results, achieving 89.2 PDMS (Navsim v1) and 88.7 EPDMS (Navsim v2), with gains over prior world-model-based approaches.
- Ablation results indicate that explicit depth learning provides complementary benefits, improving video imagination quality and increasing planning robustness.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to