Beyond State Consistency: Behavior Consistency in Text-Based World Models

arXiv cs.LG / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that text-based world models evaluated with single-step state similarity metrics (e.g., Exact Match) fail to capture whether an agent’s behavior will actually remain consistent when actions are planned or evaluated.
It proposes a behavior-aligned training paradigm using a step-level metric called Behavior Consistency Reward (BehR), which quantifies how the likelihood of a logged next action changes between the real state and the world-model-predicted state with a frozen Reference Agent.
Experiments on WebShop and TextWorld show BehR-based training improves long-term alignment, with the strongest improvements on WebShop and more limited changes in near-ceiling performance regimes.
The approach largely preserves or improves single-step prediction quality in most settings while also reducing false positives in offline surrogate evaluation.
Results indicate modest but promising gains for inference-time lookahead planning when using BehR-trained world models.

Abstract

World models have been emerging as critical components for assessing the consequences of actions generated by interactive agents in online planning and offline evaluation. In text-based environments, world models are typically evaluated and trained with single-step metrics such as Exact Match, aiming to improve the similarity between predicted and real-world states, but such metrics have been shown to be insufficient for capturing actual agent behavior. To address this issue, we introduce a new behavior-aligned training paradigm aimed at improving the functional consistency between the world model and the real environment. This paradigm focuses on optimizing a tractable step-level metric named Behavior Consistency Reward (BehR), which measures how much the likelihood of a logged next action changes between the real state and the world-model-predicted state under a frozen Reference Agent. Experiments on WebShop and TextWorld show that BehR-based training improves long-term alignment in several settings, with the clearest gains in WebShop and less movement in near-ceiling regimes, while preserving or improving single-step prediction quality in three of four settings. World models trained with BehR also achieve lower false positives in offline surrogate evaluation and show modest but encouraging gains in inference-time lookahead planning.