QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
arXiv cs.AI / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces QED-Nano, a 4B open math-logic model post-trained to generate Olympiad-level proofs, addressing the cost and opacity of proprietary theorem-proving pipelines.
- Its training approach uses three stages: supervised fine-tuning from DeepSeek-Math-V2 for proof-writing style, reinforcement learning with rubric-based rewards, and expanded RL with a reasoning cache that iteratively summarize-and-refines long proofs.
- QED-Nano reportedly outperforms larger open proof models (e.g., Nomos-1 and GPT-OSS-120B) and approaches the performance of proprietary systems like Gemini 3 Pro while using far lower inference cost.
- To enable reproducibility and further research, the authors release the full training pipeline, including the QED-Nano/QED-Nano-SFT models, FineProofs datasets, and the associated training and evaluation code.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to