QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
arXiv cs.CL / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a data synthesis framework to address the scarcity of high-quality real-world SVA corpora by using large-scale open-source RTLs to guide LLMs in generating real-world SVAs.
- It introduces bidirectional translation as a data selection method to reliably determine NL-SVA semantic equivalence.
- They train CodeV-SVA, a series of SVA generation models, with the synthesized data, with CodeV-SVA-14B achieving 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
- The work demonstrates the viability of RTL-grounded, domain-specific LLMs for hardware verification tasks and could influence future verification tooling and methodology.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA