Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion
arXiv cs.RO / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a cost-matching approach that integrates MPC with reinforcement learning for optimal humanoid locomotion using a parameterized MPC cost formulation based on centroidal dynamics.
- It trains the MPC by evaluating cost-to-go along recorded state-action trajectories and updating parameters to reduce the gap between MPC-predicted values and measured returns, enabling efficient gradient-based learning.
- The method is designed to avoid repeatedly solving the MPC optimization during training, significantly reducing computational burden compared with more direct MPC-in-the-loop learning setups.
- Experiments in simulation on a commercial humanoid platform show improved locomotion performance and increased robustness to model mismatch and external disturbances versus manually tuned baselines.


