An experimental study of KV cache reuse strategies in chunk-level caching systems
arXiv cs.CL / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies chunk-level caching (CLC) for retrieval-augmented generation, where KV caches are precomputed for retrieved text chunks to speed up LLM inference.
- It finds that existing CLC approaches can have fundamental limitations because KV caches may fail to capture cross-attention dependencies between chunks, potentially hurting output quality.
- The authors provide extensive experimental evaluation of current CLC system designs to quantify accuracy limits and applicability constraints.
- They also conclude that different CLC techniques can be complementary, and they propose a redesigned CLC approach that combines them to improve accuracy.
広告




