Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving
arXiv cs.CV / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CriticVLA, a two-stage vision-language-action framework for autonomous driving that explicitly uses the model’s critic capability rather than only acting on inputs.
- CriticVLA first proposes a rough trajectory and then refines it via multimodal evaluation and single-step optimization guided by a VLA-based critic, improving closed-loop decision quality.
- To strengthen the critic’s reasoning, the authors build a large synthetic dataset with 12.9 million annotated trajectories across diverse driving scenarios.
- Experiments on the Bench2Drive benchmark demonstrate that CriticVLA outperforms existing state-of-the-art methods, reaching a 73.33% total success rate and roughly 30% gains in difficult scenarios.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER