Stateless Decision Memory for Enterprise AI Agents

arXiv cs.AI / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that in regulated enterprise settings, “load-bearing” requirements like deterministic replay, auditable rationales, multi-tenant isolation, and statelessness make stateful memory architectures a poor fit, explaining why retrieval-augmented pipelines dominate.
  • It proposes Deterministic Projection Memory (DPM), which uses an append-only event log plus a task-conditioned projection at decision time to achieve the benefits of statelessness.
  • Across ten regulated decisioning cases under different memory budgets, DPM matches summarization-based memory at high budgets but significantly improves factual precision and reasoning coherence when memory is constrained.
  • DPM is reported to be 7–15x faster under binding budgets by requiring one LLM call at decision time (versus N for summarization), with a determinism and audit footprint that scales linearly rather than compounding.
  • The authors provide practitioner guidance via TAMS for architecture selection and include failure analysis showing why stateful memory can struggle under real enterprise operating conditions.

Abstract

Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures. We argue this reflects a hidden requirement: regulated deployment is load-bearing on four systems properties (deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale), and stateful architectures violate them by construction. We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time. On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds: at a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) and reasoning coherence by +0.53 (h=1.13, p=0.0034), paired permutation, n=10. DPM is additionally 7-15x faster at binding budgets, making one LLM call at decision time instead of N. A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but the asymmetry is structural: DPM exposes one nondeterministic call; summarization exposes N compounding calls. The audit surface follows the same one-versus-N pattern: DPM logs two LLM calls per decision while summarization logs 83-97 on LongHorizon-Bench. We conclude with TAMS, a practitioner heuristic for architecture selection, and a failure analysis of stateful memory under enterprise operating conditions. The contribution is the argument that statelessness is the load-bearing property explaining enterprise's preference for weaker but replayable retrieval pipelines, and that DPM demonstrates this property is attainable without the decisioning penalty retrieval pays.