Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]

Reddit r/MachineLearning / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The author open-sourced two single-GPU reproductions for long-context inference and neural KV-cache compaction, providing runnable benchmark code and readable implementations.
The “cartridges” repository reproduces corpus-specific compressed KV caches derived from the original paper, focusing on KV-cache reuse/compression tailored to a dataset.
The “STILL” repository reproduces reusable neural KV-cache compaction methods and includes comparisons against full-context inference, simple truncation, and cartridges.
The project’s stated aim is to make these research ideas easier to inspect and experiment with, rather than relying only on paper/blog summaries.
These releases are positioned for practitioners interested in long-context inference, memory compression, and system-level tradeoffs around KV-cache reuse.

I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:

The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.

Broadly:

cartridges reproduces corpus-specific compressed KV caches
STILL reproduces reusable neural KV-cache compaction
the STILL repo also compares against full-context inference, truncation, and cartridges

Here are the original papers / blogs -