From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification
arXiv cs.AI / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLMs can aid software engineering, but their generated code often fails correctness requirements due to errors and hallucinations.
- It introduces the NL2VC-60 dataset (60 complex algorithmic problems) and studies how to convert informal natural-language problem statements into precise, Dafny-based formal specifications plus implementation logic.
- Across seven open-weight LLMs, the authors compare tiered prompting methods (contextless, signature-guided, and iterative self-healing using Dafny verifier feedback) and find that contextless prompting performs very poorly.
- Adding structural “signatures” and Dafny-driven self-healing yields large improvements, including high verification success for Gemma 4-31B (90.91%) and a major jump for GPT-OSS 120B (from 0 to 81.82%) when guided by signature-based feedback.
- To prevent vacuous proofs (where models pass with trivial specs), the work uses uDebug for functional validation, aiming for higher assurance beyond mere verifier acceptance.
Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
The Open Source AI Studio That Nobody's Talking About
Dev.to

How I Built a 10-Language Sports Analytics Platform with FastAPI, SQLite, and Claude AI (As a Solo Non-Technical Founder)
Dev.to