Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the long-context bottleneck in large language model inference caused by KV (key-value) cache memory overhead, arguing that current eviction policies are mostly heuristic rather than theoretically grounded.
  • By adopting the Information Bottleneck principle under a linear-Gaussian surrogate of attention, the authors derive a closed-form mutual-information objective that quantifies the effective information capacity of a retained KV subset.
  • The framework shows that many existing KV eviction strategies can be viewed as approximations of a single capacity-maximization principle, reframing eviction as an information-preservation problem.
  • Based on this theory, the paper proposes CapKV, a capacity-aware eviction method that uses a log-determinant approximation with statistical leverage scores to preserve maximum predictive signal.
  • Experiments across multiple models and long-context benchmarks indicate that CapKV improves the memory-efficiency vs. generation-fidelity trade-off and consistently outperforms prior eviction approaches.

Abstract

Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset. This formulation reveals that a wide range of existing eviction strategies can be interpreted as different approximations of the same capacity-maximization principle. Guided by this insight, we introduce CapKV, a capacity-aware eviction method that directly targets information preservation via a log-determinant approximation using statistical leverage scores. This approach replaces heuristic selection with a theoretically grounded mechanism that preserves the maximum predictive signal. Extensive experiments across multiple models and long-context benchmarks show that CapKV consistently outperforms prior methods, achieving a better trade-off between memory efficiency and generational fidelity.