ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning
arXiv cs.LG / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a shift from “learning to answer” to “learning to question,” aiming for a language model to generate verifiable problem specifications, solve them, and use solver feedback for self-improvement without human supervision.
- It introduces ANCORA, an anchored-curriculum self-play framework that alternates a Proposer (creates new specs) and a Solver (generates verified solutions), with training stabilized by a two-level group-relative update, iterative self-distilled SFT, and a UCB-guided Curriculum DAG that only grows via strictly filtered, novel, verifier-verified specifications.
- The authors argue these stabilization mechanisms are necessary because sparse verifier feedback can cause Proposer collapse even when rewards are aligned with MLRL.
- Experiments using Verus show a major improvement in Dafny2Verus test-time training: pass@1 rises from 26.6% (SFT baseline) to 81.5% under 0-shot evaluation, beating a PSV self-play baseline by 15.8 points despite PSV using 1-shot inference.
- In a transfer setting initialized from Dafny2Verus seeds, the method achieves 36.2% pass@1 on held-out MBPP and 17.2% on HumanEval.
Related Articles
Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to
The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to
How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER