TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization
arXiv cs.LG / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- TurboAngle proposes compressing transformer KV caches by quantizing angles after applying a random diagonal rotation in the Fast Walsh-Hadamard domain to make consecutive element pairs more uniformly distributed on the unit circle.
- The method adds a per-layer “early-boost” mechanism that independently selects K and V codebook sizes per layer, giving higher precision to a model-specific subset of critical layers.
- Experiments across seven models (1B–7B parameters) show lossless compression for 4 models and near-lossless quality for 6 of 7 at roughly 3.28–3.67 angle bits per element.
- An asymmetric quantization variant (8-bit keys, 4-bit log-space values) achieves 6.56 total bits per element on Mistral-7B with only +0.0014 perplexity degradation and no calibration data.
- A sensitivity analysis identifies model-specific bottleneck patterns, including K-dominated versus V-dominated layers and negative-transfer layers where allocating more precision can worsen quality.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.


