Automatic Textbook Formalization

arXiv cs.AI / 4/6/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The paper presents an AI-driven case study that automatically formalizes a 500+ page graduate-level algebraic combinatorics textbook into Lean, producing a standalone formalization rather than only adapting existing library material.
  • The output is substantial—about 130K lines of Lean code with roughly 5,900 Lean declarations—indicating a major step up in scale and proficiency compared with prior textbook formalization efforts.
  • The formalization was completed in one week using 30K Claude 4.5 Opus agents operating in parallel on a shared version-controlled code base, and it reportedly sets a multi-agent software engineering record with usable results.
  • The authors argue the inference cost is comparable to or lower than estimated salaries for human expert teams, suggesting potential near-term economic advantages and efficiency gains even without improved models.
  • The resulting Lean codebase and a side-by-side blueprint website are released as open source, aiming to enable reproducibility and further progress by others.

Abstract

We present a case study where an automatic AI system formalizes a textbook with more than 500 pages of graduate-level algebraic combinatorics to Lean. The resulting formalization represents a new milestone in textbook formalization scale and proficiency, moving from early results in undergraduate topology and restructuring of existing library content to a full standalone formalization of a graduate textbook. The formalization comprises 130K lines of code and 5900 Lean declarations and was conducted within one week by a total of 30K Claude 4.5 Opus agents collaborating in parallel on a shared code base via version control, simultaneously setting a record in multi-agent software engineering with usable results. The inference cost matches or undercuts what we estimate as the salaries required for a team of human experts, and we expect there is still the potential for large efficiencies to be made without the need for better models. We make our code, the resulting Lean code base and a side-by-side blueprint website available open-source.