Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

arXiv cs.AI / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes upgrading agent-skill security auditing from single-prompt filtering to cross-file reviews by packaging skills as structured SKILL.md-based capability units.
It argues that existing guardrails may inconsistently recover malicious intent under semantics-preserving rewrites, motivating a more robust auditing method.
The authors formulate pre-load auditing for untrusted Agent Skills as a robust three-way classification problem and introduce SkillGuard-Robust.
SkillGuard-Robust uses role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication to improve detection and decision stability.
Across multiple evaluation views on SkillGuardBench and ecosystem extensions (254–404 packages), SkillGuard-Robust achieves very high exact-match performance and malicious-risk recall, while noting that harsher external-source transfer is still challenging.

Abstract

Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact consistency. On the 254-package external-ecosystem view, it reaches 99.66%, 100.00%, and 100.00%, respectively. These results support a bounded conclusion: factorized package auditing materially improves frozen and public-ecosystem robustness, while harsher external-source transfer remains an open challenge.

What to Build Still Beats How

Dev.to

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

Dev.to

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Dev.to

Whatsapp AI booking system in one prompt in 5 minutes

Dev.to

v0.22.1

Ollama Releases

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Key Points

Abstract

Related Articles

What to Build Still Beats How

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Whatsapp AI booking system in one prompt in 5 minutes

v0.22.1

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer