RecoverFormer: End-to-End Contact-Aware Recovery for Humanoid Robots

arXiv cs.RO / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

RECOVERFORMERは、予期しない擾乱からヒューマノイドロボットを回復させるためのエンドツーエンドの制御方策で、状況に応じて複数の回復行動（補償的なステップ、手と環境の接触、重心の再形成）を切り替えることを学習します。
50ステップの観測履歴に対する因果トランスフォーマに加え、回復モードを潜在的に表現して滑らかな戦略遷移を可能にするヘッドと、壁・手すり・机の縁など安定化に有効な接触面を予測する接触アフォーダンスヘッドを新規に導入しています。
Unitree G1をMuJoCo上で評価し、開放床でのみ学習した状態から、壁環境へゼロショットで移行でき、100〜300Nのプッシュと壁までの距離0.25〜1.4mの範囲で100%の回復成功を達成しています。
ダイナミクスの不一致や外乱（質量、遅延、摩擦、複合外乱）にも頑健で、例えば質量+25%では75.5%、30ms遅延では89%、低摩擦では91.5%、複合摩擦では99%と報告されています。
さらに、力の領域ごとに潜在モードが自動的に専門化し（モード単位の教師なし）、300エピソードのt-SNE分析でもその分化が検証されています。

Abstract

Humanoid robots operating in unstructured environments must recover from unexpected disturbances-a capability that remains challenging for end-to-end control policies. We present RECOVERFORMER, a fully end-to-end humanoid recovery policy that learns when and how to switch among recovery behaviors-including compensatory stepping, hand-environment contact, and center-of-mass reshaping-while maintaining robust performance under model mismatch. The architecture combines a causal transformer over a 50-step observation history with two novel heads: a latent recovery mode that enables smooth transitions among distinct recovery strategies, and a contact affordance head that predicts which environmental surfaces (walls, railings, table edges) are beneficial for stabilization. We evaluate RECOVERFORMER on the Unitree G1 humanoid in MuJoCo. Trained only on open floor, RECOVERFORMER transfers zero shot to walled environments, achieving 100% recovery success across 100-300 N pushes and across wall distances from 0.25-1.4m. Under zero-shot dynamics mismatch, RECOVERFORMER reaches 75.5% at plus +25% mass, 89% under 30 ms latency, 91.5% at low friction, and 99% under compound friction, latency and mass perturbation. The learned latent modes specialize across force regimes without mode-level supervision, validated by t-SNE analysis of 300 episodes. Taken together, these results show that a single end-to-end policy can deliver multi-modal, contact aware humanoid recovery that generalizes across perturbation magnitude, contact geometry, and dynamics shift.