Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents

arXiv cs.AI / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current approaches to “verifiable intelligence” (time-to-solution via learned structure/test-time search, and learned runtimes storing computation/memory/I-O in model state) still fail to explain deployment challenges in real open institutions.
  • It proposes “intent compilation,” transforming partially specified human purpose into inspectable artifacts that explicitly bind and constrain execution.
  • The authors distinguish closed-world solvers from open-world agents, where verification is distributed across semantic, evidentiary, procedural, and institutional dimensions.
  • They formalize remaining uncertainty in open settings as a “closure-gap vector,” introduce “delegation envelopes” as pre-authorized action-space regions, and separate “misclosure” from “undersearch.”
  • The work outlines benchmark metrics to evaluate when closure-focused interventions outperform simply adding more inference-time search.

Abstract

Recent work has framed intelligence in verifiable tasks as reducing time-to-solution through learned structure and test-time search, while systems work has explored learned runtimes in which computation, memory and I/O migrate into model state. These perspectives do not explain why capable models remain difficult to deploy in open institutions. We propose intent compilation: the transformation of partially specified human purpose into inspectable artifacts that bind execution. The relevant deployment distinction is closed-world solver versus open-world agent. In closed worlds, a checker is largely given; in open worlds, verification is distributed across semantic, evidentiary, procedural and institutional dimensions. Weformalize this residual openness as a closure-gap vector, define delegation envelopes as pre-authorized regions of action space, distinguish misclosure from undersearch, and outline benchmark metrics for testing when closure interventions outperform additional inference-time search.