AI Navigate

FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse and Prover-Effective Autoformalization

arXiv cs.AI / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • FormalEvolve is a neuro-symbolic evolutionary framework that combines LLM-driven mutation and crossover with bounded patch repair and symbolic AST rewrites to generate diverse, prover-friendly autoformalizations.
  • The approach reframes autoformalization as a budgeted, test-time search over semantically consistent repertoires to optimize prover performance under resource constraints.
  • evaluated on CombiBench and ProofNet with a generator-call budget of T=100, FormalEvolve achieves semantic hit rates SH@100 of 58.0% and 84.9%, respectively, and lowers cross-problem concentration of semantic successes (lower Gini).
  • under a fixed prover budget, the method improves downstream proving performance, and code is planned to be released publicly.

Abstract

Autoformalization aims to translate natural-language mathematics into compilable, machine-checkable statements. However, semantic consistency does not imply prover effectiveness: even semantically consistent formalizations can differ substantially in proof-search cost and success rate. In this work, we formulate autoformalization as a budgeted, test-time search for semantically consistent repertoires, and propose FormalEvolve, a compilation-gated neuro-symbolic evolutionary framework. FormalEvolve generates diverse candidates via LLM-driven mutation and crossover with bounded patch repair, while symbolic Abstract Syntax Tree (AST) rewrite operations further inject structural diversity. On CombiBench and ProofNet, under a strict generator-call budget of T = 100, FormalEvolve reaches semantic hit rates (SH@100) of 58.0% and 84.9%, and reduces cross-problem concentration of semantic successes(lower Gini). Under a fixed prover budget, FormalEvolve also improves downstream proving performance on CombiBench. Code will be released publicly.