SkillTester: Benchmarking Utility and Security of Agent Skills
arXiv cs.AI / 4/1/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The arXiv paper introduces SkillTester, a benchmarking tool designed to evaluate both the utility and the security of agent skills.
- Its framework uses paired baseline vs with-skill execution setups combined with a separate security probe suite to measure performance and safety differences.
- Results are normalized into a utility score, a security score, and a three-level security status label to make comparisons more consistent and interpretable.
- The project is presented as a quality-assurance harness for agent-first systems, with a deployed public service (skilltester.ai) and an associated GitHub repository for ongoing maintenance.




