Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that issues like context pollution and “overthinking” in tool-using autonomous LLM agents are driven by missing second-order metacognitive governance rather than lack of model skill diversity or raw capability.
  • It proposes translating human-style cognitive control into a single-agent architecture, emphasizing delayed appraisal, epistemic vigilance, and “region-of-proximal offloading.”
  • The authors introduce MESA-S (Metacognitive Skills for Agents, Single-agent), which reformulates confidence estimation as a vector that separates self-confidence (parametric certainty) from source-confidence (trust in retrieved external procedures).
  • By using mechanisms such as delayed procedural probing and “Metacognitive Skill Cards,” the framework decouples assessing a skill’s utility from the token-heavy execution of that skill.
  • Early evaluations on an in-context static benchmark executed with Gemini 3.1 Pro indicate that explicit trust provenance and delayed escalation can reduce reasoning loops and mitigate supply-chain-style vulnerabilities while preventing offloading-induced confidence inflation.

Abstract

As large language models (LLMs) transition into autonomous agents integrated with extensive tool ecosystems, traditional routing heuristics increasingly succumb to context pollution and "overthinking". We argue that the bottleneck is not a deficit in algorithmic capability or skill diversity, but the absence of disciplined second-order metacognitive governance. In this paper, our scientific contribution focuses on the computational translation of human cognitive control - specifically, delayed appraisal, epistemic vigilance, and region-of-proximal offloading - into a single-agent architecture. We introduce MESA-S (Metacognitive Skills for Agents, Single-agent), a preliminary framework that shifts scalar confidence estimation into a vector separating self-confidence (parametric certainty) from source-confidence (trust in retrieved external procedures). By formalizing a delayed procedural probe mechanism and introducing Metacognitive Skill Cards, MESA-S decouples the awareness of a skill's utility from its token-intensive execution. Evaluated under an In-Context Static Benchmark Evaluation natively executed via Gemini 3.1 Pro, our early results suggest that explicitly programming trust provenance and delayed escalation mitigates supply-chain vulnerabilities, prunes unnecessary reasoning loops, and prevents offloading-induced confidence inflation. This architecture offers a scientifically cautious, behaviorally anchored step toward reliable, epistemically vigilant single-agent orchestration.