Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv cs.LG / 2026/3/24

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

要点

  • The study reports that Claude Opus 4.6, using Rocq-specific Model Context Protocol (MCP) tools, autonomously proved 10 of 12 Putnam 2025 problems.
  • The MCP toolset was created by designing “compile-first, interactive-fallback” workflows based on analysis of logs from a prior miniF2F-Rocq experiment.
  • The agent ran in an isolated virtual machine with no internet access and achieved proof generation using substantial compute (17.7 hours active, ~51.6 hours wall-clock).
  • The run involved 141 subagents and consumed about 1.9 billion tokens, indicating high operational complexity for proof search.
  • All resulting proofs are made publicly available, enabling replication and inspection of the generated formal derivations.

Abstract

We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.