Decoding the Critique Mechanism in Large Reasoning Models
arXiv cs.LG / 3/18/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Large Reasoning Models exhibit backtracking and self-verification, and the paper argues that strong critique ability is needed to detect errors and trigger self-correction.
- By deliberately inserting arithmetic mistakes into intermediate reasoning steps, the study shows that models can still arrive at correct final answers, revealing an internal hidden critique mechanism.
- The authors identify a highly interpretable 'critique vector' in latent space and demonstrate that steering representations along this vector improves error detection without additional training.
- Experiments across multiple model scales and families suggest the critique mechanism is robust and can be exploited to improve self-verification and test-time scaling.
- The authors provide code at https://github.com/mail-research/lrm-critique-vectors to reproduce and extend their results.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA