Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
arXiv cs.AI / 4/20/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces “Hard Mode” automated theorem proving, where an agent must first independently discover the answer before constructing a formal Lean 4 proof, rather than assuming the final result is embedded in the statement.
- It releases Hard Mode benchmark variants (MiniF2F-Hard and FIMO-Hard) that have been expert re-annotated to support more realistic ATP evaluation.
- It proposes the Discover And Prove (DAP) agentic framework that uses LLM natural-language reasoning with explicit self-reflection to find candidate answers, then rewrites Hard Mode problems into “Easy Mode” forms for existing ATP provers.
- DAP achieves new state-of-the-art results by raising solved problems on CombiBench from 7 to 10 (Pass@16) and by being the first to formally prove 36 Putnam theorems in Hard Mode.
- The authors also report a large performance gap: top LLMs exceed 80% accuracy on Hard Mode problems while formal provers manage under 10%, suggesting Hard Mode benchmarks better expose limitations relevant to real proof discovery.



