Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

arXiv cs.AI / 3/23/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The article introduces a neuro-symbolic proof generation framework that combines large language models with interactive theorem proving to automate proof search for system verification.
It uses a best-first tree search over proof states and repeatedly queries an LLM for the next candidate proof step, with LLM fine-tuning on proof state-step pairs.
On the symbolic side, it integrates ITP tools to repair rejected steps, filter and rank proof states, and automatically discharge subgoals when progress stalls.
Implemented on a new Isabelle REPL, it achieves strong results on the seL4 benchmark, proving up to 77.6% of the theorems and surpassing previous LLM-based approaches and Sledgehammer, indicating good generalization.

Abstract

Formal verification via interactive theorem proving is increasingly used to ensure the correctness of critical systems, yet constructing large proof scripts remains highly manual and limits scalability. Advances in large language models (LLMs), especially in mathematical reasoning, make their integration into software verification increasingly promising. This paper introduces a neuro-symbolic proof generation framework designed to automate proof search for systems-level verification projects. The framework performs a best-first tree search over proof states, repeatedly querying an LLM for the next candidate proof step. On the neural side, we fine-tune LLMs using datasets of proof state-step pairs; on the symbolic side, we incorporate a range of ITP tools to repair rejected steps, filter and rank proof states, and automatically discharge subgoals when search progress stalls. This synergy enables data-efficient LLM adaptation and semantics-informed pruning of the search space. We implement the framework on a new Isabelle REPL that exposes fine-grained proof states and automation tools, and evaluate it on the FVEL seL4 benchmark and additional Isabelle developments. On seL4, the system proves up to 77.6\% of the theorems, substantially surpassing previous LLM-based approaches and standalone Sledgehammer, while solving significantly more multi-step proofs. Results across further benchmarks demonstrate strong generalization, indicating a viable path toward scalable automated software verification.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/23DailyView insight →

Build a WhatsApp AI Assistant Using Laravel, Twilio and OpenAI

Dev.to

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

Anthropic shut down the Claude OAuth workaround. Here's the cheapest alternative in 2026.

Dev.to

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

Dev.to

Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

Key Points

Abstract

💡 Insights using this article

Related Articles

Build a WhatsApp AI Assistant Using Laravel, Twilio and OpenAI

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Anthropic shut down the Claude OAuth workaround. Here's the cheapest alternative in 2026.

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer