Automated Conjecture Resolution with Formal Verification

arXiv cs.LG / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces an automated framework that combines an informal LLM-style reasoning agent with formal theorem verification to solve and check research-level math problems with minimal human input.
  • It uses Rethlas to explore candidate proof strategies using reasoning primitives and a theorem search component (Matlas), then Archon translates the informal reasoning into machine-checkable Lean 4 proofs.
  • Archon relies on iterative refinement, structured task decomposition, and automated proof synthesis to ensure the final solution is verifiably correct in Lean 4.
  • The authors report end-to-end resolution of an open commutative algebra problem, with the proof formally verified in Lean 4 and “essentially no human involvement.”
  • The work argues for a broader paradigm where informal and formal reasoning systems, paired with strong theorem retrieval tools, can reduce human effort in mathematical research while producing trustworthy, verifiable results.

Abstract

Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.