Process Supervision via Verbal Critique Improves Reasoning in Large Language Models
arXiv cs.CL / 4/24/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes “Verbal Process Supervision (VPS),” a training-free inference-time framework that improves LLM reasoning by iteratively generating, critiquing, and refining outputs using structured natural-language critique from a stronger supervisor.
- VPS introduces a new scaling axis—critique granularity—alongside existing approaches like deeper chains, wider sampling, and learned step scorers (PRMs).
- On GPQA Diamond, VPS lets GPT-5.4 variants reach 94.9% at R=4, beating prior state of the art (94.1%) without gradient updates.
- On AIME 2025, VPS achieves “weak-actor rescue,” dramatically raising performance from 11.7–26.7% to 63.3–90.0% by guiding weaker models using verbal critique.
- Across GPQA and LiveCodeBench V6, VPS outperforms methods such as Reflexion and Self-Consistency at matched compute, with results correlating strongly to the supervisor–actor capability gap while degrading when errors can’t be expressed linguistically (e.g., code synthesis).



