SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- SemantiCache proposes a semantic-aware KV cache compression framework that preserves semantic integrity by aligning compression with language structure.
- It partitions the cache into semantically coherent chunks using natural semantic boundaries and applies a Greedy Seed-Based Clustering within each chunk to form semantic clusters.
- The clusters are merged into semantic cores and enhanced with a Proportional Attention mechanism to rebalance attention after merging.
- Empirical results show decoding speedups up to 2.61x and substantial memory footprint reductions, with performance comparable to the original model.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA