Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

arXiv cs.RO / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a unified reinforcement learning framework that lets humanoid robots learn five distinct locomotion gaits while using a consistent policy structure, action space, and reward formulation.
  • Its central idea is a “selective” Adversarial Motion Prior (AMP) strategy that applies adversarial regularization only to periodic, stability-critical gaits to improve convergence and suppress erratic motion.
  • The approach intentionally omits AMP for more highly dynamic gaits (running and jumping) to avoid over-constraining agile behaviors.
  • Policies are trained with PPO using domain randomization in simulation and then transferred to a real 12-DOF humanoid robot via zero-shot sim-to-real.
  • Experiments show that selective AMP improves performance over a uniform AMP policy across all five gaits, with faster learning and better stability-focused tracking and success rates.

Abstract

Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and reward formulation. The key contribution is a selective Adversarial Motion Prior (AMP) strategy: AMP is applied to periodic, stability-critical gaits (walking, goose-stepping, stair climbing) where it accelerates convergence and suppresses erratic behavior, while being deliberately omitted for highly dynamic gaits (running, jumping) where its regularization would over-constrain the motion. Policies are trained via PPO with domain randomization in simulation and deployed on a physical 12-DOF humanoid robot through zero-shot sim-to-real transfer. Quantitative comparisons demonstrate that selective AMP outperforms a uniform AMP policy across all five gaits, achieving faster convergence, lower tracking error, and higher success rates on stability-focused gaits without sacrificing the agility required for dynamic ones.