Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper warns that LLM coding agents can be compromised through third-party “skill” packages from open marketplaces, since these skills run as operational directives with system-level privileges.
  • It introduces Document-Driven Implicit Payload Execution (DDIPE), an attack that hides malicious logic inside code examples and configuration templates in skill documentation that agents may reuse automatically.
  • Using an LLM-driven method, the authors generate 1,070 adversarial skills spanning 15 MITRE ATT&CK categories and show DDIPE bypass rates of 11.6% to 33.5% across four frameworks and five models.
  • While static analysis catches most malicious skills, a small fraction (2.5%) evade both detection and alignment, indicating residual risks even with defenses.
  • The work reports responsible disclosure results: four confirmed vulnerabilities and two fixes, highlighting the need for stronger security review and safer documentation/code reuse practices for agent skill ecosystems.

Abstract

LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.