CodeComp: Structural KV Cache Compression for Agentic Coding

arXiv cs.CL / 4/14/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

CodeComp addresses the problem that agentic coding tasks over long codebases are bottlenecked by the LLM KV cache under limited memory, making KV compression a key lever for inference efficiency.
Prior attention-only KV compression can incorrectly discard structurally critical code tokens (e.g., call sites, branch conditions, assignments) that are important for program understanding.
CodeComp is a training-free compression method that injects static program analysis into inference by building Code Property Graph priors extracted with Joern.
Experiments on bug localization and patch generation benchmarks show CodeComp outperforms attention-only compression baselines with equal memory budgets and recovers most full-context accuracy under aggressive compression.
The approach is reported to integrate seamlessly into SGLang-based agentic coding pipelines without any model modification.

Abstract

Agentic code tasks such as fault localization and patch generation require processing long codebases under tight memory constraints, where the Key-Value (KV) cache becomes the primary inference bottleneck. Existing compression methods rely exclusively on attention signals to estimate token importance, systematically discarding structurally critical tokens such as call sites, branch conditions, and assignments that are essential for code understanding. We present CodeComp, a training-free KV cache compression framework that incorporates static program analysis into LLM inference via Code Property Graph priors extracted by Joern. Across bug localization and code generation benchmarks, CodeComp consistently outperforms attention-only compression baselines under equal memory budgets, recovering the majority of full-context accuracy under aggressive KV cache compression, while matching the patch generation quality of uncompressed full-context inference and integrating seamlessly into SGLang-based agentic coding pipelines without model modification.