The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLMs can systematically fail when a salient surface cue contradicts an implicit feasibility constraint, reflecting a heuristic-over-constraint reasoning vulnerability.
  • Using causal-behavioral analysis on the “car wash problem” across six models, the authors find distance cues dominate the goal signal and that attribution patterns align more with keyword associations than true compositional inference.
  • The proposed Heuristic Override Benchmark (HOB) evaluates 14 models on 500 minimal-pair instances across multiple heuristic and constraint families, showing generally low strict accuracy (no model above 75%) and especially poor performance on presence constraints.
  • The authors show that small interventions—such as emphasizing the key object or prompting models to enumerate preconditions—can materially improve results, indicating the issue is often constraint inference rather than missing underlying knowledge.
  • Cross-model parametric probes suggest the same “sigmoid heuristic” behavior generalizes to other heuristic types (cost/efficiency/semantic similarity), and removing constraints can further degrade performance due to conservative bias.

Abstract

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning | AI Navigate