Intent-aligned Formal Specification Synthesis via Traceable Refinement

arXiv cs.LG / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VeriSpecGen, a framework that synthesizes intent-aligned formal specifications in Lean from natural language while maintaining traceability to individual requirements.
  • It decomposes user intents into atomic requirements, generates requirement-targeted tests with explicit traceability maps, and uses localized clause-level repair when validation fails.
  • VeriSpecGen reports strong benchmark performance, reaching 86.6% on the VERINA SpecGen task with Claude Opus 4.5 and improving baselines by up to 31.8 points across model families.
  • Beyond inference, the method produces 343K training examples from refinement trajectories, and training on these trajectories yields reported 62–106% improvements and transfer to general reasoning abilities.

Abstract

Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a formal specification. However, specifications are frequently missing in real-world codebases, and writing high-quality specifications remains expensive and expertise-intensive. We present VeriSpecGen, a traceable refinement framework that synthesizes intent-aligned specifications in Lean through requirement-level attribution and localized repair. VeriSpecGen decomposes natural language into atomic requirements and generates requirement-targeted tests with explicit traceability maps to validate generated specifications. When validation fails, traceability maps attribute failures to specific requirements, enabling targeted clause-level repairs. VeriSpecGen achieve 86.6% on VERINA SpecGen task using Claude Opus 4.5, improving over baselines by up to 31.8 points across different model families and scales. Beyond inference-time gains, we generate 343K training examples from VeriSpecGen refinement trajectories and demonstrate that training on these trajectories substantially improves specification synthesis by 62-106% relative and transfers gains to general reasoning abilities.