PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

arXiv cs.RO / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces PCHC (Preference-Conditioned Humanoid Control) using Multi-Objective Reinforcement Learning to balance competing humanoid objectives like speed versus energy consumption.
It argues that existing RL approaches often rely on fixed weighting and yield only a single suboptimal policy, whereas the proposed method aims for diverse, Pareto-front-aligned behaviors.
The framework uses a Beta-distribution-based alignment mechanism driven by preference vectors to modulate a Mixture-of-Experts (MoE) module under one preference-conditioned policy.
Experiments across two humanoid tasks show that the robot can shift objective priorities in real time based on the provided preference condition, supported by both simulations and real-world tests.

Abstract

Humanoid robots often need to balance competing objectives, such as maximizing speed while minimizing energy consumption. While current reinforcement learning (RL) methods can master complex skills like fall recovery and perceptive locomotion, they are constrained by fixed weighting strategies that produce a single suboptimal policy, rather than providing a diverse set of solutions for sophisticated multi-objective control. In this paper, we propose a novel framework leveraging Multi-Objective Reinforcement Learning (MORL) to achieve Preference-Conditioned Humanoid Control (PCHC). Unlike conventional methods that require training a series of policies to approximate the Pareto front, our framework enables a single, preference-conditioned policy to exhibit a wide spectrum of diverse behaviors. To effectively integrate these requirements, we introduce a Beta distribution-based alignment mechanism based on preference vectors modulating a Mixture-of-Experts (MoE) module. We validated our approach on two representative humanoid tasks. Extensive simulations and real-world experiments demonstrate that the proposed framework allows the robot to adaptively shift its objective priorities in real-time based on the input preference condition.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

They Did Not Accidentally Make Work the Answer to Who You Are

Dev.to

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

They Did Not Accidentally Make Work the Answer to Who You Are

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer