Improving Human Performance with Value-Aware Interventions: A Case Study in Chess

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles a key challenge in AI-assisted sequential decision-making: deciding when and how an assistant should intervene in human actions.
  • It introduces “value-aware interventions” based on reinforcement learning principles, showing that mismatches between what a (human) suboptimal policy does and what would maximize immediate reward plus next-state value signal good intervention opportunities.
  • The authors model intervention as an MDP with an intervention budget, deriving an optimal single-intervention strategy and an approximate method for multiple interventions that ranks actions by the size of the policy-value discrepancy.
  • Evaluations in chess—using learned models of human behavior from large-scale gameplay data—show simulation gains over interventions that rely on the strongest engine (Stockfish) across many settings.
  • A within-subject study with 20 players across 600 games finds the interventions significantly help low- and mid-skill players while performing comparably to expert-engine interventions for high-skill players.

Abstract

AI systems are increasingly used to assist humans in sequential decision-making tasks, yet determining when and how an AI assistant should intervene remains a fundamental challenge. A potential baseline is to recommend the optimal action according to a strong model. However, such actions assume optimal follow-up actions, which human decision makers may fail to execute, potentially reducing overall performance. In this work, we propose and study value-aware interventions, motivated by a basic principle in reinforcement learning: under the Bellman equation, the optimal policy selects actions that maximize the immediate reward plus the value function. When a decision maker follows a suboptimal policy, this policy-value consistency no longer holds, creating discrepancies between the actions taken by the policy and those that maximize the immediate reward plus the value of the next state. We show that these policy-value inconsistencies naturally identify opportunities for intervention. We formalize this problem in a Markov decision process where an AI assistant may override human actions under an intervention budget. In the single-intervention regime, we show that the optimal strategy is to recommend the action that maximizes the human value function. For settings with multiple interventions, we propose a tractable approximation that prioritizes interventions based on the magnitude of the policy-value discrepancy. We evaluate these ideas in the domain of chess by learning models of humans from large-scale gameplay data. In simulation, our approach consistently outperforms interventions based on the strongest chess engine (Stockfish) in a wide range of settings. A within-subject human study with 20 players and 600 games further shows that our interventions significantly improve performance for low- and mid-skill players while matching expert-engine interventions for high-skill players.

Improving Human Performance with Value-Aware Interventions: A Case Study in Chess | AI Navigate