Simulator Adaptation for Sim-to-Real Learning of Legged Locomotion via Proprioceptive Distribution Matching

arXiv cs.RO / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the sim-to-real performance drop in simulation-trained legged locomotion by adapting simulator dynamics to better reflect real hardware behavior.
  • It proposes proprioceptive distribution matching, which compares hardware and simulation rollouts as distributions over joint observations and actions, avoiding time alignment and external privileged sensing.
  • The matching metric is used as a black-box objective to identify simulator parameters and to fit action-delta and residual actuator models for more accurate dynamics.
  • Experiments on the Go2 quadruped show that the method recovers similar parameter-quality and policy-performance gains to privileged state-matching baselines in sim-to-sim ablations.
  • Real-world tests report substantial drift reduction using under five minutes of hardware data, including challenging two-legged walking scenarios, suggesting strong practicality for sim-to-real transfer.

Abstract

Simulation trained legged locomotion policies often exhibit performance loss on hardware due to dynamics discrepancies between the simulator and the real world, highlighting the need for approaches that adapt the simulator itself to better match hardware behavior. Prior work typically quantify these discrepancies through precise, time-aligned matching of joint and base trajectories. This process requires motion capture, privileged sensing, and carefully controlled initial conditions. We introduce a practical alternative based on proprioceptive distribution matching, which compares hardware and simulation rollouts as distributions of joint observations and actions, eliminating the need for time alignment or external sensing. Using this metric as a black-box objective, we explore adapting simulator dynamics through parameter identification, action-delta models, and residual actuator models. Our approach matches the parameter recovery and policy-performance gains of privileged state-matching baselines across extensive sim-to-sim ablations on the Go2 quadruped. Real-world experiments demonstrate substantial drift reduction using less than five minutes of hardware data, even for a challenging two-legged walking behavior. These results demonstrate that proprioceptive distribution matching provides a practical and effective route to simulator adaptation for sim-to-real transfer of learned legged locomotion.