Pseudo-Expert Regularized Offline RL for End-to-End Autonomous Driving in Photorealistic Closed-Loop Environments
arXiv cs.RO / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a camera-only end-to-end offline RL approach for autonomous driving that trains from a fixed simulator dataset without additional exploration, aiming to avoid imitation-learning failure modes.
- To reduce offline RL instability from out-of-distribution action overestimation, the method regularizes training using pseudo ground-truth trajectories derived from expert logs.
- Experiments are performed in a neural rendering closed-loop environment learned from the public nuScenes dataset, focusing on driving safety and efficiency metrics.
- The authors report substantial improvements over imitation learning baselines, including lower collision rates and higher route completion.
- An open-source implementation is provided, enabling others to reproduce and build on the proposed pseudo-expert regularized offline RL framework.
Related Articles

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial

AI just found thousands of zero-days. Your firewall is still pattern-matching from 2014
Dev.to

Emergency Room and the Vanishing Moat
Dev.to

I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How
Dev.to