I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark suite shows 1.00 retrieval accuracy even when the target fact is buried at 90% depth in a 1M-token test, p@1 = 0.987 and p@8 = 0.999 on a 1M-key fact bank, speculative decoding up to 1.8× faster, and reproducible results that also report about 99.8% FLOP savings and full memory savings at long context. The benchmark docs, reproduction scripts, and verification logs are public, so anyone can check the results for themselves. https://github.com/byte271/HydraLM
[link] [comments]


