Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

arXiv cs.AI / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a hierarchical proof search framework for automated code verification in Lean 4 that decomposes complex verification goals into simpler subgoals before tactic-level proving.
It introduces a principled decomposition score that serves as both the training reward and the inference-time ranking criterion to align optimization with deployment.
Goedel-Code-Prover-8B is trained via supervised initialization followed by hybrid reinforcement learning, using a single unified policy for both decomposition and completion.
On three Lean-based benchmarks totaling 427 tasks, the model achieves 62.0% prove success, a 2.6× improvement over the strongest baseline, and demonstrates scalable, efficient inference compared to larger neural provers.

Abstract

Large language models (LLMs) can generate plausible code but offer limited guarantees of correctness. Formally verifying that implementations satisfy specifications requires constructing machine-checkable proofs, a task that remains beyond current automation. We propose a hierarchical proof search framework for automated code verification in Lean~4 that decomposes complex verification goals into structurally simpler subgoals before attempting tactic-level proving. Central to our approach is a principled decomposition score that combines constructive justification with structural effectiveness. Crucially, this score serves as both the training reward and the inference-time ranking criterion, ensuring strict alignment between optimization and deployment. We train Goedel-Code-Prover-8B, a single unified policy for both decomposition and completion, via supervised initialization followed by hybrid reinforcement learning, where a continuous decomposition reward drives planning exploration while supervised replay stabilizes proof generation. On three Lean-based code verification benchmarks comprising 427 tasks, our 8B-parameter model achieves a 62.0\% prove success rate, a 2.6

\times

improvement over the strongest baseline, surpassing neural provers up to 84

\times

larger. We further observe consistent inference-time scaling: success rates improve monotonically with search iterations and sampling budget, with our trained model achieving greater efficiency than frontier off-the-shelf models of comparable scale.

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Dev.to

Neural Networks in Mobile Robot Motion

Dev.to

Retraining vs Fine-tuning or Transfer Learning? [D]

Reddit r/MachineLearning

Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Neural Networks in Mobile Robot Motion

Retraining vs Fine-tuning or Transfer Learning? [D]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer