GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

arXiv cs.RO / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • GSDrive is a new end-to-end autonomous driving training framework that improves driving policies by combining imitation learning (IL) with reinforcement learning (RL) while addressing issues from sparse, event-based rewards.
  • The approach uses 3D Gaussian Splatting (3DGS) to build a differentiable, physics-based simulation environment and to create reward signals grounded in simulated interactions.
  • It adds a flow-matching-based trajectory predictor to generate multiple candidate (multi-mode) trajectories, then rolls them out in the simulator to evaluate and compare prospective rewards.
  • By providing dense, immediate feedback (rather than only catastrophic collision outcomes), GSDrive helps mitigate premature convergence to suboptimal behaviors.
  • Experiments on the reconstructed nuScenes dataset show that GSDrive outperforms existing simulation-based RL driving methods in closed-loop tests, and the code is publicly available.

Abstract

End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.