Harness as an Asset: Enforcing Determinism via the Convergent AI Agent Framework (CAAF)

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM-based agents create a controllability gap in safety-critical engineering because even rare undetected constraint violations can make systems undeployable.
  • It proposes the Convergent AI Agent Framework (CAAF) to enforce “Fail-Safe Determinism” using recursive decomposition with physical context firewalls, a “Harness as an Asset” approach that encodes domain invariants into machine-readable registries, and structured semantic gradients with state locking for monotonic convergence.
  • Experiments in autonomous driving (SAE Level 3) and pharmaceutical continuous-flow reactor design show CAAF using GPT-4o-mini achieved 100% paradox/violation detection, while a monolithic GPT-4o achieved 0% even at temperature=0.
  • The authors show that alternative multi-agent approaches (e.g., debate or sequential checking) also performed at 0% across many trials, and an ablation study (Mono+UAI) isolates the Unified Assertion Interface (UAI) as the core reliability driver.
  • CAAF is reported to be robust to prompt hints and to support fully offline deployment by relying on a single commodity model for all components.

Abstract

Large Language Models (LLMs) produce a controllability gap in safety-critical engineering: even low rates of undetected constraint violations render a system undeployable. Current orchestration paradigms suffer from sycophantic compliance, context attention decay [Liu et al., 2024], and stochastic oscillation during self-correction [Huang et al., 2024]. We introduce the Convergent AI Agent Framework (CAAF), which transitions agentic workflows from open-loop generation to closed-loop Fail-Safe Determinism via three pillars: (1) Recursive Atomic Decomposition with physical context firewalls; (2) Harness as an Asset, formalizing domain invariants into machine-readable registries enforced by a deterministic Unified Assertion Interface (UAI); and (3) Structured Semantic Gradients with State Locking for monotonic convergence. Empirical evaluation across two domains -- SAE Level 3 (L3) autonomous driving (AD) (n=30, 7 conditions) and pharmaceutical continuous flow reactor design (n=20, 4 conditions including a Mono+UAI ablation) -- shows that CAAF-all-GPT-4o-mini achieves 100% paradox detection while monolithic GPT-4o achieves 0% (even at temperature=0). The pharmaceutical benchmark features 7 simultaneous constraints with nonlinear Arrhenius interactions and a 3-way minimal unsatisfiable subset, representing a structurally harder challenge than the 2-constraint AD paradox. Alternative multi-agent architectures (debate, sequential checking) also achieve 0% across 80 trials, confirming that CAAF's reliability derives from its deterministic UAI, not from multi-agent orchestration per se. A Mono+UAI ablation (95%) isolates UAI as the core contribution. CAAF's reliability is invariant to prompt hints; all components use a single commodity model, enabling fully offline deployment.