ReSim: Reliable World Simulation for Autonomous Driving

arXiv cs.CV / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ReSim proposes a reliable driving world simulation approach that can handle hazardous and non-expert ego behaviors that existing driving world models struggle with due to their safe-expert-only training data.
  • It builds a controllable world model by augmenting real-world human demonstrations with diverse non-expert trajectories collected in a driving simulator (e.g., CARLA), and leverages a diffusion transformer-based video generator with improved conditioning strategies.
  • To connect high-fidelity simulation with decision-making tasks that require reward signals, ReSim introduces a Video2Reward module that estimates rewards from simulated futures.
  • The paper reports gains including up to 44% higher visual fidelity, over 50% improved controllability for both expert and non-expert actions, and performance improvements on NAVSIM for planning (2%) and policy selection (25%).

Abstract

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates a reward from ReSim's simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.