Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that “agent skills” (structured instruction/script/reference bundles used with an LLM) should be treated as untrusted code until explicitly verified by the runtime that loads them.
  • It emphasizes that relying on trust signals like signatures, clearance levels, or registry provenance is unsafe, and instead the runtime must enforce a default-deny posture until verification passes.
  • Without skill verification, human-in-the-loop (HITL) oversight must run on every irreversible action, which the authors say becomes impractical and turns into ineffective rubber-stamping at scale.
  • The authors propose a trust schema with per-skill manifest verification levels, a capability gate whose HITL policy depends on those levels, and a “biconditional” correctness criterion that any verification method must satisfy under adversarial evaluation.
  • They also provide a portable runtime profile with ten normative guidelines derived from a working open-source reference implementation, aiming for model-agnostic adoption without retraining or fine-tuning.

Abstract

Agent skills -- structured packages of instructions, scripts, and references that augment a large language model (LLM) without modifying the model itself -- have moved from convenience to first-class deployment artifact. The runtime that loads them inherits the same problem package managers and operating systems have always faced: a piece of content claims a behavior; the runtime must decide whether to believe it. We argue this paper's central thesis up front: a skill is \emph{untrusted code} until it is verified, and the runtime that loads it must enforce that default rather than infer trust from a signature, a clearance, or a registry of origin. Without skill verification, a human-in-the-loop (HITL) gate must fire on every irreversible call -- which is operationally untenable and degrades into rubber-stamping at any non-trivial scale. With skill verification treated as a separate, gated process, HITL fires only for what is unverified, and the system becomes sustainable. We give a trust schema (\S\ref{sec:schema}) that includes an explicit verification level on every skill manifest; a capability gate (\S\ref{sec:gate}) whose HITL policy is a function of that verification level; a \emph{biconditional} correctness criterion (\S\ref{sec:biconditional}) that any candidate verification procedure must satisfy on an adversarial-ensemble exercise (\S\ref{sec:eval}); and a portable runtime profile (\S\ref{sec:guidelines}) with ten normative guidelines abstracted from a working open-source reference implementation \cite{metere2026enclawed}. The contribution is harness- and model-agnostic; nothing here requires retraining, fine-tuning, or proprietary infrastructure.