QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

arXiv cs.CL / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a data synthesis framework to address the scarcity of high-quality real-world SVA corpora by using large-scale open-source RTLs to guide LLMs in generating real-world SVAs.
It introduces bidirectional translation as a data selection method to reliably determine NL-SVA semantic equivalence.
They train CodeV-SVA, a series of SVA generation models, with the synthesized data, with CodeV-SVA-14B achieving 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
The work demonstrates the viability of RTL-grounded, domain-specific LLMs for hardware verification tasks and could influence future verification tooling and methodology.

Abstract

SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.

The massive shift toward edge computing and local processing

Dev.to

Self-Refining Agents in Spec-Driven Development

Dev.to

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Dev.to

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

Reddit r/LocalLLaMA

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Key Points

Abstract

Related Articles

The massive shift toward edge computing and local processing

Self-Refining Agents in Spec-Driven Development

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer