When AI Shows Its Work, Is It Actually Working? Step-Level Evaluation Reveals Frontier Language Models Frequently Bypass Their Own Reasoning
arXiv cs.CL / 3/25/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that many frontier language models’ step-by-step “show your work” rationales are often decorative, because removing individual reasoning sentences usually does not change the final answer.
- It introduces a step-level evaluation method that removes one reasoning step at a time to measure “faithfulness,” requiring only API access and costing about $1–$2 per model per task.
- Testing 10 frontier models on sentiment, mathematics, topic classification, and medical QA found that for most models, the final answer depends on any given step less than 17% of the time, indicating post-hoc narrative generation is common.
- The study finds faithfulness is highly model- and task-specific, with only a couple of models showing more genuine step dependence on certain tasks while still “shortcutting” others.
- Additional analysis suggests “output rigidity” and mechanistic differences in chain-of-thought attention patterns, supporting the conclusion that training objectives—not just scale—determine whether reasoning is truly used.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial
Why I Switched From GPT-4 to Small Language Models for Two of My Products
Dev.to
Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development
Dev.to
In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!
Reddit r/artificial