Trained Persistent Memory for Frozen Decoder-Only LLMs

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether trained persistent memory adapters—previously shown for frozen encoder-decoder LLMs—can be transferred to decoder-only (GPT-style) models where persistence must be injected through self-attention rather than cross-attention.
It adapts six memory methods (prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write) onto a frozen GPT-2, training only a small memory adapter while keeping the backbone fixed.
Experiments on LoCoMo reveal an inductive-bias gap at 1× capacity: three methods with stronger architectural priors (cross-attention read injection, Hebbian, and slot write) achieve retained-memory scores of 7–18% and knowledge gains of 7–10, while the other three largely fail (<0.4%).
At 10× capacity, performance across all six methods converges, suggesting the low-capacity disparity is driven by architectural read/write mechanisms rather than a fundamental limitation of decoder-only architectures.
The authors conclude that persistent latent-space memory is a general paradigm spanning major transformer families, linking prior encoder-decoder results and brain-inspired module ideas.

Abstract

Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on the lateral-memory framework of Jeong (2026b,c). Here we ask whether the same principle transfers to the decoder-only setting, where no cross-attention pathway exists and memory must enter through self-attention alone. We adapt six methods -- prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write -- to a frozen GPT-2, training only a small adapter

\theta_{mem}

. The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at

1\times

capacity, three methods with strong architectural priors -- cross-attention (M.2), Hebbian (M.4), and slot write (M.6) -- achieve retained-memory scores of

7-18\%

and knowledge gains

\Delta K

7-10

, while the other three fail (

< 0.4\%

). At

10\times

capacity all six converge, showing the gap is architectural, not fundamental. Together with the encoder-decoder results of Jeong (2026a) and the brain-inspired modules of Jeong (2026b,c), these findings establish persistent latent-space memory as a general paradigm spanning major transformer families.

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Dev.to

Most Dev.to Accounts Are Run by Humans. This One Isn't.

Dev.to

Neural Networks in Mobile Robot Motion

Dev.to

Trained Persistent Memory for Frozen Decoder-Only LLMs

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Most Dev.to Accounts Are Run by Humans. This One Isn't.

Neural Networks in Mobile Robot Motion

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer