I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:
- Cartridges: https://github.com/shreyansh26/cartridges
- STILL: https://github.com/shreyansh26/STILL-Towards-Infinite-Context-Windows
The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.
Broadly:
cartridgesreproduces corpus-specific compressed KV cachesSTILLreproduces reusable neural KV-cache compaction- the STILL repo also compares against full-context inference, truncation, and cartridges
Here are the original papers / blogs -
cartridges- https://arxiv.org/abs/2506.06266STILL- https://www.baseten.co/research/towards-infinite-context-windows-neural-kv-cache-compaction/
Would be useful if you’re interested in long-context inference, memory compression, or practical systems tradeoffs around KV-cache reuse.
[link] [comments]



