Knowledge Capsules: Structured Nonparametric Memory Units for LLMs

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that conventional LLM knowledge storage in parametric weights is expensive to update, and that standard RAG is indirect because retrieved knowledge competes with input tokens in attention.
  • It introduces “Knowledge Capsules,” structured nonparametric memory units that encode normalized relational knowledge built directly from document corpora using a frozen base model.
  • Instead of appending knowledge as text, the approach uses an External Key Value Injection (KVI) framework to compile capsules into attention-compatible key/value representations for direct participation in attention computation.
  • The authors report consistent improvements over RAG and GraphRAG on multiple QA benchmarks, especially for long-context and multi-hop reasoning, without requiring any parameter updates.
  • The contribution is positioned as shifting knowledge integration from context-level token augmentation to memory-level interaction, aiming to improve stability and accuracy.

Abstract

Large language models (LLMs) encode knowledge in parametric weights, making it costly to update or extend without retraining. Retrieval-augmented generation (RAG) mitigates this limitation by appending retrieved text to the input, but operates purely through context expansion, where external knowledge competes as tokens within the attention mechanism. As a result, its influence is indirect and often unstable, particularly in long context and multi hop reasoning scenarios. We propose Knowledge Capsules, structured nonparametric memory units that represent normalized relational knowledge and can be constructed directly from document corpora using a frozen base model. Instead of injecting knowledge as text, we introduce an External Key Value Injection (KVI) framework that compiles capsules into attention-compatible key value representations, enabling external knowledge to directly participate in the model's attention computation. By shifting knowledge integration from context-level augmentation to memory level interaction, the proposed framework consistently outperforms RAG and GraphRAG across multiple QA benchmarks, with improved stability and accuracy in long context and multi hop reasoning, while requiring no parameter updates.