Governed Reasoning for Institutional AI

arXiv cs.AI / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that institutional decision-making (e.g., compliance, clinical triage, prior authorization appeals) needs a different AI architecture than general-purpose conversational agents because agents can make “silent” errors without triggering human review.
  • It proposes “Cognitive Core,” a governed decision substrate built from nine typed cognitive primitives and a four-tier governance model where human review is required for execution, not applied as an after-the-fact check.
  • Cognitive Core includes an endogenous, tamper-evident SHA-256 hash-chain audit ledger to support trustworthy accountability, and a demand-driven delegation design for both declared and autonomously reasoned epistemic sequences.
  • In benchmarks on an 11-case prior authorization appeal dataset, Cognitive Core reaches 91% accuracy, outperforming ReAct (55%) and Plan-and-Solve (45%), and it produced zero silent errors versus 5–6 for the baselines.
  • The authors introduce “governability” as a key evaluation metric—measuring how reliably a system knows when it should refrain from autonomous action—and claim new institutional domains can be deployed via configuration (YAML) rather than engineering.

Abstract

Institutional decisions -- regulatory compliance, clinical triage, prior authorization appeal -- require a different AI architecture than general-purpose agents provide. Agent frameworks infer authority conversationally, reconstruct accountability from logs, and produce silent errors: incorrect determinations that execute without any human review signal. We propose Cognitive Core: a governed decision substrate built from nine typed cognitive primitives (retrieve, classify, investigate, verify, challenge, reflect, deliberate, govern, generate), a four-tier governance model where human review is a condition of execution rather than a post-hoc check, a tamper-evident SHA-256 hash-chain audit ledger endogenous to computation, and a demand-driven delegation architecture supporting both declared and autonomously reasoned epistemic sequences. We benchmark three systems on an 11-case balanced prior authorization appeal evaluation set. Cognitive Core achieves 91% accuracy against 55% (ReAct) and 45% (Plan-and-Solve). The governance result is more significant: CC produced zero silent errors while both baselines produced 5-6. We introduce governability -- how reliably a system knows when it should not act autonomously -- as a primary evaluation axis for institutional AI alongside accuracy. The baselines are implemented as prompts, representing the realistic deployment alternative to a governed framework. A configuration-driven domain model means deploying a new institutional decision domain requires YAML configuration, not engineering capacity.