Improving Sparse Memory Finetuning
arXiv cs.LG / 4/8/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper addresses continual adaptation for LLMs without catastrophic forgetting by localizing learning updates to a small subset of parameters rather than modifying shared dense weights.
- It proposes an open-source Sparse Memory Finetuning (SMF) pipeline that retrofits a pretrained model (Qwen-2.5-0.5B) with explicit sparse memory modules to support continual learning.
- The authors introduce a theoretically motivated slot-selection mechanism using KL divergence to target memory updates for “surprising” tokens versus a background distribution.
- Experiments show the retrofitted models can learn new factual knowledge while maintaining held-out capabilities with minimal forgetting, supporting the sparse-update approach as practical and effective.
- The method is positioned as feasible on consumer hardware, lowering barriers to deploying continual learning in real-world settings.

