AI Navigate

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

arXiv cs.LG / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper demonstrates feasibility of persistent memory in the continuous latent space of a frozen encoder–decoder LLM using a single Flan-T5-XL backbone, small trainable adapters, and a single dataset.
  • It presents six architectural methods spanning three injection points and four write mechanisms, with every read/write operation being differentiable on dense vectors rather than text-level memory.
  • The memory bank accumulates at inference time without gradients, enabling conversational learning, and a forgetting-curve evaluation on LoCoMo at 1× and 10× capacity shows that all six adapters recall memory at 10× while only some methods succeed at 1×.
  • The authors argue the memory bank can scale to arbitrarily large capacity without altering the backbone, and they frame this pilot study as establishing a baseline and taxonomy for future, larger-scale work.

Abstract

Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three injection points and four write mechanisms; unlike text-level memory systems, every write and read is a differentiable operation on dense vectors. After training only the adapter, the memory bank continues to accumulate at inference time without gradients, enabling \emph{conversational learning}. Under a forgetting-curve evaluation on LoCoMo at two capacity scales (1\times and 10\times), the stateless baseline scores exactly zero; at 10\times all six trained adapters produce positive memory-recall curves; at 1\times three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.