QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
arXiv cs.CL / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a data synthesis framework to address the scarcity of high-quality real-world SVA corpora by using large-scale open-source RTLs to guide LLMs in generating real-world SVAs.
- It introduces bidirectional translation as a data selection method to reliably determine NL-SVA semantic equivalence.
- They train CodeV-SVA, a series of SVA generation models, with the synthesized data, with CodeV-SVA-14B achieving 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
- The work demonstrates the viability of RTL-grounded, domain-specific LLMs for hardware verification tasks and could influence future verification tooling and methodology.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to