Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

arXiv cs.LG / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper demonstrates feasibility of persistent memory in the continuous latent space of a frozen encoder–decoder LLM using a single Flan-T5-XL backbone, small trainable adapters, and a single dataset.
It presents six architectural methods spanning three injection points and four write mechanisms, with every read/write operation being differentiable on dense vectors rather than text-level memory.
The memory bank accumulates at inference time without gradients, enabling conversational learning, and a forgetting-curve evaluation on LoCoMo at 1× and 10× capacity shows that all six adapters recall memory at 10× while only some methods succeed at 1×.
The authors argue the memory bank can scale to arbitrarily large capacity without altering the backbone, and they frame this pilot study as establishing a baseline and taxonomy for future, larger-scale work.

Abstract

Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three injection points and four write mechanisms; unlike text-level memory systems, every write and read is a differentiable operation on dense vectors. After training only the adapter, the memory bank continues to accumulate at inference time without gradients, enabling \emph{conversational learning}. Under a forgetting-curve evaluation on LoCoMo at two capacity scales (1

\times

and 10

\times

), the stateless baseline scores exactly zero; at 10

\times

all six trained adapters produce positive memory-recall curves; at 1

\times

three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

The Research That Doesn't Exist

Dev.to

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

Dev.to

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Key Points

Abstract

Related Articles

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

The Research That Doesn't Exist

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer