GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
arXiv cs.CL / 3/17/2026
💬 OpinionModels & Research
Key Points
- GradMem introduces writing context into memory via per-sample test-time gradient descent while keeping model weights frozen.
- It optimizes a model-level self-supervised context reconstruction loss, enabling an iterative, loss-driven memory write with error correction.
- On associative key–value retrieval, GradMem outperforms forward-only memory writers of the same size and scales capacity more effectively with more gradient steps.
- When applied to pretrained language models, it achieves competitive results on natural language tasks like bAbI and SQuAD variants using only information encoded in memory.
- The approach offers a memory-efficient alternative to large per-layer KV caches for long-context conditioning in transformers.
Related Articles

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」
日経XTECH

Superposition and the Capsule: Quantum State Collapse Meets AI Identity
Dev.to

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely
Dev.to

The Loop as Laboratory: What 3,190 Cycles of Autonomous AI Operation Reveal
Dev.to

MiMo-V2-Pro & Omni & TTS: "We will open-source — when the models are stable enough to deserve it."
Reddit r/LocalLLaMA