QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
arXiv cs.AI / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces QuanBench+, a unified benchmark for LLM-based quantum code generation that aligns tasks across Qiskit, PennyLane, and Cirq to reduce confounding from framework-specific knowledge.
- It includes 42 aligned tasks covering quantum algorithms, gate decomposition, and state preparation, and evaluates models using executable functional tests plus metrics such as Pass@1/Pass@5 and KL-divergence-based acceptance for probabilistic outputs.
- The study measures not only one-shot performance but also “feedback-based repair,” where models revise code after runtime errors or incorrect answers, leading to substantial gains in best scores across all three frameworks.
- Reported best one-shot Pass@1 results are 59.5% (Qiskit), 54.8% (Cirq), and 42.9% (PennyLane), while feedback-based repair boosts them to 83.3%, 76.2%, and 66.7% respectively.
- Overall, the findings suggest meaningful progress but indicate that reliable multi-framework quantum code generation is still largely unresolved and remains strongly dependent on framework-specific knowledge.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to