Dynamics Distillation for Efficient and Transferable Control Learning

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “Sim2Sim2Sim,” a framework that compresses high-fidelity vehicle simulator dynamics into a learned, parallelizable dynamics model for scalable reinforcement learning.
  • Control policies are trained entirely in the distilled (learned) dynamics environment and then deployed back into the original high-fidelity simulator to improve both optimization efficiency and transfer reliability.
  • The authors show that evaluating a learned dynamics model only by predictive accuracy is insufficient; the model should be judged by the quality of reinforcement-learning policies it enables.
  • The work targets robust control policy learning for autonomous driving by combining physical realism from simulation with computational scalability from learned models.

Abstract

Robust control policy learning for autonomous driving requires training environments to be both physically realistic and computationally scalable, properties that existing simulators provide only in isolation. We introduce Sim2Sim2Sim, a framework that bridges high-fidelity vehicle simulation and scalable reinforcement learning by distilling simulator dynamics into a highly parallelizable learned dynamics model. By training control policies purely within this distilled environment and deploying them back into the high-fidelity source simulator, we demonstrate more efficient policy optimization and reliable transfer under challenging dynamics. We further show that predictive accuracy alone does not fully characterize a learned dynamics model's suitability as a reinforcement learning training environment, which should also be assessed by the quality of the policies it enables.