The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLMs can systematically fail when a salient surface cue contradicts an implicit feasibility constraint, reflecting a heuristic-over-constraint reasoning vulnerability.
Using causal-behavioral analysis on the “car wash problem” across six models, the authors find distance cues dominate the goal signal and that attribution patterns align more with keyword associations than true compositional inference.
The proposed Heuristic Override Benchmark (HOB) evaluates 14 models on 500 minimal-pair instances across multiple heuristic and constraint families, showing generally low strict accuracy (no model above 75%) and especially poor performance on presence constraints.
The authors show that small interventions—such as emphasizing the key object or prompting models to enumerate preconditions—can materially improve results, indicating the issue is often constraint inference rather than missing underlying knowledge.
Cross-model parametric probes suggest the same “sigmoid heuristic” behavior generalizes to other heuristic types (cost/efficiency/semantic similarity), and removing constraints can further degrade performance due to conservative bias.

Abstract

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.

Knowledge Governance For The Agentic Economy.

Dev.to

AI server farms heat up the neighborhood for miles around, paper finds

The Register

Does the Claude “leak” actually change anything in practice?

Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model

Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

Dev.to

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Key Points

Abstract

Related Articles

Knowledge Governance For The Agentic Economy.

AI server farms heat up the neighborhood for miles around, paper finds

Does the Claude “leak” actually change anything in practice?

87.4% of My Agent's Decisions Run on a 0.8B Model

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer