Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv cs.LG / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The study reports that Claude Opus 4.6, using Rocq-specific Model Context Protocol (MCP) tools, autonomously proved 10 of 12 Putnam 2025 problems.
The MCP toolset was created by designing “compile-first, interactive-fallback” workflows based on analysis of logs from a prior miniF2F-Rocq experiment.
The agent ran in an isolated virtual machine with no internet access and achieved proof generation using substantial compute (17.7 hours active, ~51.6 hours wall-clock).
The run involved 141 subagents and consumed about 1.9 billion tokens, indicating high operational complexity for proof search.
All resulting proofs are made publicly available, enabling replication and inspection of the generated formal derivations.

Abstract

We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/24DailyView insight →

I built an online background remover and learned a lot from launching it

Dev.to

How AI is Transforming Dynamics 365 Business Central

Dev.to

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Reddit r/artificial

Do I need different approaches for different types of business information errors?

Dev.to

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Key Points

Abstract

💡 Insights using this article

Related Articles

I built an online background remover and learned a lot from launching it

How AI is Transforming Dynamics 365 Business Central

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Do I need different approaches for different types of business information errors?

ShieldCortex: What We Learned Protecting AI Agent Memory

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer