Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

arXiv cs.RO / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes a unified reinforcement learning framework that lets humanoid robots learn five distinct locomotion gaits while using a consistent policy structure, action space, and reward formulation.
Its central idea is a “selective” Adversarial Motion Prior (AMP) strategy that applies adversarial regularization only to periodic, stability-critical gaits to improve convergence and suppress erratic motion.
The approach intentionally omits AMP for more highly dynamic gaits (running and jumping) to avoid over-constraining agile behaviors.
Policies are trained with PPO using domain randomization in simulation and then transferred to a real 12-DOF humanoid robot via zero-shot sim-to-real.
Experiments show that selective AMP improves performance over a uniform AMP policy across all five gaits, with faster learning and better stability-focused tracking and success rates.

Abstract

Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and reward formulation. The key contribution is a selective Adversarial Motion Prior (AMP) strategy: AMP is applied to periodic, stability-critical gaits (walking, goose-stepping, stair climbing) where it accelerates convergence and suppresses erratic behavior, while being deliberately omitted for highly dynamic gaits (running, jumping) where its regularization would over-constrain the motion. Policies are trained via PPO with domain randomization in simulation and deployed on a physical 12-DOF humanoid robot through zero-shot sim-to-real transfer. Quantitative comparisons demonstrate that selective AMP outperforms a uniform AMP policy across all five gaits, achieving faster convergence, lower tracking error, and higher success rates on stability-focused gaits without sacrificing the agility required for dynamic ones.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/22DailyView insight →

GeoReg LLM-Driven Few-Shot Socio-Economic Estimation for Data-Scarce Regions

Dev.to

Enterprise AI Governance Has Shifted from Policy to Execution

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Rethinking CNN Models for Audio Classification

Dev.to

Database world trying to build natural language query systems again – this time with LLMs

The Register

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

Key Points

Abstract

💡 Insights using this article

Related Articles

GeoReg LLM-Driven Few-Shot Socio-Economic Estimation for Data-Scarce Regions

Enterprise AI Governance Has Shifted from Policy to Execution

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Rethinking CNN Models for Audio Classification

Database world trying to build natural language query systems again – this time with LLMs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer