Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv cs.LG / 2026/3/24

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

要点

The study reports that Claude Opus 4.6, using Rocq-specific Model Context Protocol (MCP) tools, autonomously proved 10 of 12 Putnam 2025 problems.
The MCP toolset was created by designing “compile-first, interactive-fallback” workflows based on analysis of logs from a prior miniF2F-Rocq experiment.
The agent ran in an isolated virtual machine with no internet access and achieved proof generation using substantial compute (17.7 hours active, ~51.6 hours wall-clock).
The run involved 141 subagents and consumed about 1.9 billion tokens, indicating high operational complexity for proof search.
All resulting proofs are made publicly available, enabling replication and inspection of the generated formal derivations.

Abstract

We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

💡 この記事が使われたインサイト

AIの最新ニュースをまとめた「今日の要点」で、この記事が取り上げられています。

📅 3/24Dailyインサイトを見る →

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日経XTECH

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

日経XTECH

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

Publickey

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

Qiita

なんと397BのAIモデルをiPhoneで動かすことに成功

GIGAZINE

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

要点

Abstract

💡 この記事が使われたインサイト

関連記事

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

なんと397BのAIモデルをiPhoneで動かすことに成功

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer