Bench2Drive-VL: Benchmarks for Closed-Loop Autonomous Driving with Vision-Language Models
arXiv cs.RO / 4/3/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces Bench2Drive-VL, a benchmark suite that brings closed-loop evaluation to vision-language-model-based autonomous driving, addressing the limitations of existing open-loop QA benchmarks.
- It proposes DriveCommenter, which automatically generates diverse, behavior-grounded question-answer pairs across all CARLA driving situations, including rare and severe out-of-distribution events like off-route and off-road deviations.
- The work provides a unified protocol and interface to plug modern VLMs directly into the Bench2Drive closed-loop environment for fair comparison with traditional driving agents.
- It includes a flexible reasoning and control framework supporting multi-format visual inputs and configurable graph-based chain-of-thought execution.
- The authors release an end-to-end development ecosystem with open-source code and annotated datasets to enable reproduction and further research.



