Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the long-context bottleneck in large language model inference caused by KV (key-value) cache memory overhead, arguing that current eviction policies are mostly heuristic rather than theoretically grounded.
By adopting the Information Bottleneck principle under a linear-Gaussian surrogate of attention, the authors derive a closed-form mutual-information objective that quantifies the effective information capacity of a retained KV subset.
The framework shows that many existing KV eviction strategies can be viewed as approximations of a single capacity-maximization principle, reframing eviction as an information-preservation problem.
Based on this theory, the paper proposes CapKV, a capacity-aware eviction method that uses a log-determinant approximation with statistical leverage scores to preserve maximum predictive signal.
Experiments across multiple models and long-context benchmarks indicate that CapKV improves the memory-efficiency vs. generation-fidelity trade-off and consistently outperforms prior eviction approaches.

Abstract

Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset. This formulation reveals that a wide range of existing eviction strategies can be interpreted as different approximations of the same capacity-maximization principle. Guided by this insight, we introduce CapKV, a capacity-aware eviction method that directly targets information preservation via a log-determinant approximation using statistical leverage scores. This approach replaces heuristic selection with a theoretically grounded mechanism that preserves the maximum predictive signal. Extensive experiments across multiple models and long-context benchmarks show that CapKV consistently outperforms prior methods, achieving a better trade-off between memory efficiency and generational fidelity.

Claude Opus 4.7: What Actually Changed and Whether You Should Migrate

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

The Inference Inflection: Why AI's Center of Gravity Has Shifted from Training to Inference

Dev.to

AI transparency index on pvgomes.com

Dev.to

Mastering On-Device GenAI: How to Fine-Tune LLMs for Android Using LoRA and Kotlin 2.x

Dev.to

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

Key Points

Abstract

Related Articles

Claude Opus 4.7: What Actually Changed and Whether You Should Migrate

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

The Inference Inflection: Why AI's Center of Gravity Has Shifted from Training to Inference

AI transparency index on pvgomes.com

Mastering On-Device GenAI: How to Fine-Tune LLMs for Android Using LoRA and Kotlin 2.x

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer