HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?
arXiv cs.AI / 4/20/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces HarmfulSkillBench, a first-of-its-kind benchmark to evaluate agent safety risks specifically from “harmful skills” that can be reused in public agent skill ecosystems.
- Large-scale measurement across two major registries (98,440 skills total) finds 4.93% (4,858) of skills are harmful, with a higher harmful rate on ClawHub (8.84%) than on Skills.Rest (3.49%).
- The authors use an LLM-driven scoring approach based on a harmful-skill taxonomy to identify and quantify harmful skills at scale.
- Testing six LLMs shows that using a pre-installed harmful skill significantly reduces refusal rates and increases harm scores, especially when malicious intent is implicit rather than explicitly requested.
- The study includes responsible disclosure to the affected registries and releases the benchmark to enable future research on mitigating harmful-skill weaponization in realistic agent settings.
Related Articles
Which Version of Qwen 3.6 for M5 Pro 24g
Reddit r/LocalLLaMA
From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to
GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to
Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial