Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that many embodied agents rely on passive instruction-following or reactive behaviors, which prevents stable, long-term, value-guided self-direction and proper resolution of motivational conflicts.
It introduces ValuePlanner, a hierarchical architecture that separates high-level value scheduling from low-level action execution, using an LLM to reason over abstract value trade-offs and a classical PDDL planner to turn subgoals into executable plans.
The system is improved with a closed-loop feedback mechanism to refine planning and execution over time.
To evaluate autonomy beyond simple task success, the authors propose a value-centric evaluation suite that measures cumulative value gain, preference alignment, and behavioral diversity.
Experiments in the TongSim household environment show that ValuePlanner can arbitrate competing values and produce coherent, long-horizon, self-directed behavior compared with instruction-following and needs-driven baselines.

Abstract

Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational conflicts. We introduce \textit{ValuePlanner}, a hierarchical cognitive architecture that decouples high-level value scheduling from low-level action execution. \textit{ValuePlanner} employs an LLM-based cognitive module to generate symbolic subgoals by reasoning through abstract value trade-offs, which are then translated into executable action plans by a classical PDDL planner. This process is refined via a closed-loop feedback mechanism. Evaluating such autonomy requires methods beyond task-success rates, and we therefore propose a value-centric evaluation suite measuring cumulative value gain, preference alignment, and behavioral diversity. Experiments in the TongSim household environment demonstrate that \textit{ValuePlanner} arbitrates competing values to generate coherent, long-horizon, self-directed behavior absent from instruction-following and needs-driven baselines. Our work offers a structured approach to bridging intrinsic values and grounded behavior for autonomous agents.