Context-Value-Action Architecture for Value-Driven Large Language Model Agents

arXiv cs.AI / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

既存のLLMエージェント評価は「LLM-as-a-judge」の自己参照バイアスにより見かけ上の性能が良く見えがちで、実データのグラウンドトゥルースで検証すると、推論を強めるほど価値の偏り（value polarization）が悪化することを示します。
その問題に対し、Stimulus-Organism-Response（S-O-R）とSchwartzの基本的価値理論に基づく Context-Value-Action（CVA）アーキテクチャを提案し、行動生成と認知推論を分離します。
CVAでは、自身による自己検証に頼らず、人間の真正データで学習した「Value Verifier」で動的な価値活性を明示的にモデリングして、価値偏向の抑制を狙います。
CVAは、実世界の相互作用トレース110万件超を含むCVABenchでベースラインを大きく上回り、偏極の緩和と高い行動忠実性・解釈可能性の両立を報告しています。

Abstract

Large Language Models (LLMs) have shown promise in simulating human behavior, yet existing agents often exhibit behavioral rigidity, a flaw frequently masked by the self-referential bias of current "LLM-as-a-judge" evaluations. By evaluating against empirical ground truth, we reveal a counter-intuitive phenomenon: increasing the intensity of prompt-driven reasoning does not enhance fidelity but rather exacerbates value polarization, collapsing population diversity. To address this, we propose the Context-Value-Action (CVA) architecture, grounded in the Stimulus-Organism-Response (S-O-R) model and Schwartz's Theory of Basic Human Values. Unlike methods relying on self-verification, CVA decouples action generation from cognitive reasoning via a novel Value Verifier trained on authentic human data to explicitly model dynamic value activation. Experiments on CVABench, which comprises over 1.1 million real-world interaction traces, demonstrate that CVA significantly outperforms baselines. Our approach effectively mitigates polarization while offering superior behavioral fidelity and interpretability.