Peer-Predictive Self-Training for Language Model Reasoning
arXiv cs.AI / 4/16/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes Peer-Predictive Self-Training (PST), a label-free self-improvement method where multiple language models collaborate using a cross-model aggregated answer as an internal training target.
- During sequential response generation, PST quantifies how informative each intermediate response is about the final aggregate using pointwise mutual information (PMI), and scales fine-tuning updates accordingly.
- The method updates models less when their responses are already aligned with the aggregate and more when responses are less informative or misaligned, aiming to sharpen reasoning consistency.
- Experiments on mathematical reasoning benchmarks (SimulEq, Math500, MultiArith) show exact-match accuracy gains of 2.2–4.3 percentage points across Gemma-2-2B, LLaMA-3.2-1B, and Qwen-2.5-1.5B.
- PST also reduces the generator–verifier gap (GV-Gap) by 26–40% and requires no external supervision, indicating cross-model peer feedback can be an effective self-supervised training approach.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to