No More Marching: Learning Humanoid Locomotion for Short-Range SE(2) Targets

arXiv cs.RO / 4/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how humanoid robots can efficiently reach short-range target poses in the SE(2) space with transitions that are fast, robust, and energy efficient.
  • It criticizes prior learning-based locomotion methods for focusing on velocity tracking rather than direct pose reaching, which can cause inefficient “marching” behavior in short tasks.
  • The authors propose a reinforcement learning framework with a new constellation-based reward function that explicitly optimizes motion toward SE(2) target poses.
  • They introduce a benchmarking setup that evaluates energy use, time-to-target, and footstep count across a range of SE(2) goals.
  • Experiments indicate the method outperforms standard baselines and transfers successfully from simulation to real hardware, underscoring the value of task-targeted reward design.

Abstract

Humanoids operating in real-world workspaces must frequently execute task-driven, short-range movements to SE(2) target poses. To be practical, these transitions must be fast, robust, and energy efficient. While learning-based locomotion has made significant progress, most existing methods optimize for velocity-tracking rather than direct pose reaching, resulting in inefficient, marching-style behavior when applied to short-range tasks. In this work, we develop a reinforcement learning approach that directly optimizes humanoid locomotion for SE(2) targets. Central to this approach is a new constellation-based reward function that encourages natural and efficient target-oriented movement. To evaluate performance, we introduce a benchmarking framework that measures energy consumption, time-to-target, and footstep count on a distribution of SE(2) goals. Our results show that the proposed approach consistently outperforms standard methods and enables successful transfer from simulation to hardware, highlighting the importance of targeted reward design for practical short-range humanoid locomotion.