Just read the google recent blog post they're claiming 6x KV cache compression with zero accuracy loss and up to 8x attention speedup on H100s. Presented at ICLR 2026.
Curious if anyone has tried it and what real world gains they got outside of the paper benchmarks.
[link] [comments]
