There have been many TurboQuant implementations recently in llama.cpp, mlx, vllm, and sglang, but a lot of the discussion and code around them feels pretty noisy and looks to be AI-generated.
I’m trying to understand which claims from the paper have actually been validated by independent third parties. For example, has the lossless compression claim been reproduced, and how does TurboQuant perform in practice compared with other low-bit quantization methods?
I spent an entire day reproducing the TurboQuant+QJL setup, and it only made performance worse in my tests. I was wondering whether QJL is providing a meaningful practical benefit here.
[link] [comments]



