IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

arXiv cs.AI / 4/7/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces IC3-Evolve, an automated offline code-evolution framework that uses an LLM to propose small, auditable, slot-restricted patches to an IC3 (PDR) hardware model checker implementation.
  • It enforces correctness via proof-/witness-gated validation, requiring independently checkable certificates for SAFE results and replayable counterexample traces for UNSAFE results to prevent unsound changes.
  • Because LLM inference is used only during offline patch search, the final deployed artifact is a standalone evolved checker with no runtime ML/LLM dependency or inference overhead.
  • Experiments on the HWMCC benchmark show that the approach can reliably discover practical heuristic improvements and generalize to additional unseen public and industrial benchmarks under strict correctness gates.

Abstract

IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof to safety. In practice, the performance of IC3 is dominated by a large web of interacting heuristics and implementation choices, making manual tuning costly, brittle, and hard to reproduce. This paper presents IC3-Evolve, an automated offline code-evolution framework that utilizes an LLM to propose small, slot-restricted and auditable patches to an IC3 implementation. Crucially, every candidate patch is admitted only through proof- /witness-gated validation: SAFE runs must emit a certificate that is independently checked, and UNSAFE runs must emit a replayable counterexample trace, preventing unsound edits from being deployed. Since the LLM is used only offline, the deployed artifact is a standalone evolved checker with zero ML/LLM inference overhead and no runtime model dependency. We evolve on the public hardware model checking competition (HWMCC) benchmark and evaluate the generalizability on unseen public and industrial model checking benchmarks, showing that IC3-Evolve can reliably discover practical heuristic improvements under strict correctness gates.