Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP
arXiv cs.LG / 2026/3/24
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
要点
- The study reports that Claude Opus 4.6, using Rocq-specific Model Context Protocol (MCP) tools, autonomously proved 10 of 12 Putnam 2025 problems.
- The MCP toolset was created by designing “compile-first, interactive-fallback” workflows based on analysis of logs from a prior miniF2F-Rocq experiment.
- The agent ran in an isolated virtual machine with no internet access and achieved proof generation using substantial compute (17.7 hours active, ~51.6 hours wall-clock).
- The run involved 141 subagents and consumed about 1.9 billion tokens, indicating high operational complexity for proof search.
- All resulting proofs are made publicly available, enabling replication and inspection of the generated formal derivations.
