Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]

Reddit r/MachineLearning / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The author open-sourced two single-GPU reproductions for long-context inference and neural KV-cache compaction, providing runnable benchmark code and readable implementations.
  • The “cartridges” repository reproduces corpus-specific compressed KV caches derived from the original paper, focusing on KV-cache reuse/compression tailored to a dataset.
  • The “STILL” repository reproduces reusable neural KV-cache compaction methods and includes comparisons against full-context inference, simple truncation, and cartridges.
  • The project’s stated aim is to make these research ideas easier to inspect and experiment with, rather than relying only on paper/blog summaries.
  • These releases are positioned for practitioners interested in long-context inference, memory compression, and system-level tradeoffs around KV-cache reuse.

I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:

The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.

Broadly:

  • cartridges reproduces corpus-specific compressed KV caches
  • STILL reproduces reusable neural KV-cache compaction
  • the STILL repo also compares against full-context inference, truncation, and cartridges

Here are the original papers / blogs -

Would be useful if you’re interested in long-context inference, memory compression, or practical systems tradeoffs around KV-cache reuse.

submitted by /u/shreyansh26
[link] [comments]