ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ExploreVLA addresses a key limitation of end-to-end Vision-Language-Action (VLA) autonomous driving models trained via imitation learning by adding exploration capabilities beyond the expert behavior distribution.
The method uses a learned world model for dense world modeling by augmenting trajectory prediction with future RGB and depth image generation to provide richer visual/geometric supervision.
It turns world-model image prediction uncertainty into an intrinsic novelty measure, which—when deemed safe—guides policy exploration toward out-of-distribution yet learnable scenarios.
The policy is trained with a safety-gated reward optimized via Group Relative Policy Optimization (GRPO), combining exploration and safety constraints.
On NAVSIM and nuScenes, ExploreVLA reports state-of-the-art performance with a PDMS score of 93.7 and EPDMS of 88.8 on NAVSIM, and plans to release code and demos.

Abstract

End-to-end autonomous driving models based on Vision-Language-Action (VLA) architectures have shown promising results by learning driving policies through behavior cloning on expert demonstrations. However, imitation learning inherently limits the model to replicating observed behaviors without exploring diverse driving strategies, leaving it brittle in novel or out-of-distribution scenarios. Reinforcement learning (RL) offers a natural remedy by enabling policy exploration beyond the expert distribution. Yet VLA models, typically trained on offline datasets, lack directly observable state transitions, necessitating a learned world model to anticipate action consequences. In this work, we propose a unified understanding-and-generation framework that leverages world modeling to simultaneously enable meaningful exploration and provide dense supervision. Specifically, we augment trajectory prediction with future RGB and depth image generation as dense world modeling objectives, requiring the model to learn fine-grained visual and geometric representations that substantially enrich the planning backbone. Beyond serving as a supervisory signal, the world model further acts as a source of intrinsic reward for policy exploration: its image prediction uncertainty naturally measures a trajectory's novelty relative to the training distribution, where high uncertainty indicates out-of-distribution scenarios that, if safe, represent valuable learning opportunities. We incorporate this exploration signal into a safety-gated reward and optimize the policy via Group Relative Policy Optimization (GRPO). Experiments on the NAVSIM and nuScenes benchmarks demonstrate the effectiveness of our approach, achieving a state-of-the-art PDMS score of 93.7 and an EPDMS of 88.8 on NAVSIM. The code and demo will be publicly available at https://zihaosheng.github.io/ExploreVLA/.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/6DailyView insight →

Black Hat Asia

AI Business

How Bash Command Safety Analysis Works in AI Systems

Dev.to

How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide

Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Dev.to

ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

How Bash Command Safety Analysis Works in AI Systems

How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer