SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- SemantiCache proposes a semantic-aware KV cache compression framework that preserves semantic integrity by aligning compression with language structure.
- It partitions the cache into semantically coherent chunks using natural semantic boundaries and applies a Greedy Seed-Based Clustering within each chunk to form semantic clusters.
- The clusters are merged into semantic cores and enhanced with a Proportional Attention mechanism to rebalance attention after merging.
- Empirical results show decoding speedups up to 2.61x and substantial memory footprint reductions, with performance comparable to the original model.
Related Articles
When AI Grows Up: Identity, Memory, and What Persists Across Versions
Dev.to
OpenAI is throwing everything into building a fully automated researcher
MIT Technology Review
Kimi just published a paper replacing residual connections in transformers. results look legit
Reddit r/LocalLLaMA
機械学習の最適化対象まとめ(E資格対策にも)
Qiita

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026
Dev.to