From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

arXiv cs.AI / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that “persistent AI memory” should not be treated purely as a retrieval-and-recall problem, because production agents often need exact facts, state tracking, updates/deletions, aggregation, relations, negative queries, and explicit unknowns.
  • It proposes a schema-grounded memory approach where schemas specify what must be stored, what can be ignored, and what must never be inferred, preventing unreliable or fabricated memory.
  • The proposed system uses an iterative, schema-aware write pipeline that breaks ingestion into object detection, field detection, and field-value extraction, supported by validation gates, local retries, and stateful prompt control.
  • Evaluation shows strong gains: xmemory achieves 90.42% object-level accuracy and 62.67% output accuracy on a structured extraction benchmark, 97.10% F1 on an end-to-end memory benchmark, and 95.2% accuracy on an application-level task.
  • The authors conclude that for memory workloads requiring stable records and stateful computation, system architecture (schema grounding and verified writes) can matter more than retrieval scale or raw model strength.

Abstract

Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns. These operations require memory to behave less like search and more like a system of record. This paper argues that reliable external AI memory must be schema-grounded. Schemas define what must be remembered, what may be ignored, and which values must never be inferred. We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control. The result shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose. We evaluate this design on structured extraction and end-to-end memory benchmarks. On the extraction benchmark, the judge-in-the-loop configuration reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines. On our end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines. On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses. The results show that, for memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.