Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP
arXiv cs.LG / 3/24/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The study reports that Claude Opus 4.6, using Rocq-specific Model Context Protocol (MCP) tools, autonomously proved 10 of 12 Putnam 2025 problems.
- The MCP toolset was created by designing “compile-first, interactive-fallback” workflows based on analysis of logs from a prior miniF2F-Rocq experiment.
- The agent ran in an isolated virtual machine with no internet access and achieved proof generation using substantial compute (17.7 hours active, ~51.6 hours wall-clock).
- The run involved 141 subagents and consumed about 1.9 billion tokens, indicating high operational complexity for proof search.
- All resulting proofs are made publicly available, enabling replication and inspection of the generated formal derivations.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

I built an online background remover and learned a lot from launching it
Dev.to
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to