ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

arXiv cs.AI / 4/10/2026

📰 News

Key Points

  • The paper proposes ProofSketcher, a hybrid system that combines an LLM with a lightweight, trusted proof-checking kernel to improve reliability in mathematical and logical reasoning.
  • Instead of requiring fully formal proof authoring like Lean/Coq, the LLM outputs a typed proof sketch in a compact DSL that the kernel expands into explicit proof obligations.
  • The approach targets common LLM failure modes in proofs—such as omitted side conditions, invalid inference steps, and citations to lemmas not derivable from the given context—by enforcing checkable structure.
  • The core idea is to retain theorem-prover-grade guarantees while reducing the “avalanche” of low-level details typically needed for complete formalization.
  • ProofSketcher is presented as a pipeline bridging natural-language/LLM reasoning and rigorous formal verification with smaller trusted computing than full interactive proving.
  • categories: [

Abstract

The large language models (LLMs) might produce a persuasive argument within mathematical and logical fields, although such argument often includes some minor missteps, including the entire omission of side conditions, invalid inference patterns, or appeals to a lemma that cannot be derived logically out of the context being discussed. These omissions are infamously hard to notice solely out of the text, as even the misconstrued construction still may seem mostly accurate. Conversely, interactive theorem provers like Lean and Coq have rigorous reliability by ensuring that syntactic and semantic statements only accept statements that can pass all the syntactic and semantic steps in the program which is a small trusted kernel of the language type-checks with. Despite the fact that this technique provides strong guarantees, it comes at quite a heavy price: the evidence must be completely formalized, and the evidence user or a auxiliary search program must provide an avalanche of low-level information. This paper presents a hybrid pipeline where an LLM generates a typed proof sketch in a compact DSL and a lightweight trusted kernel expands the sketch into explicit proof obligations.