From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks
arXiv cs.AI / 4/6/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces a new dataset that pairs CAPEC and CWE descriptions with vulnerable code snippets to address limitations of existing vulnerability datasets that lack detailed code-to-vulnerability mappings.
- It uses GPT-4o, Llama, and Claude to generate targeted examples whose vulnerabilities align with specific CAPEC/CWE documentation.
- Preliminary results indicate the generated code is highly consistent across the three LLMs, with reported 0.98 cosine similarity among the code outputs.
- The dataset contains 615 CAPEC code snippets across Java, Python, and JavaScript and is positioned as a resource for research into vulnerability understanding as well as training ML models for vulnerability detection and remediation.




