| Hey everyone, Just like everyone else I have also came across Turboquant,Rabitq,Quip, recent llama.cpp and others.I've been profiling what global rotation is actually doing to hidden states during low-bit quantization, something I think is worth discussing and directly hits almost every global rotation concepts and I have tried explaining the "why" nerve to the intuitions that I have traced in the community discussions in the paper. The usual story is: • naive low-bit quantization destroys outliers • rotation spreads them out • scalar quantization works much better after that That part seems true. But when I measured the reconstructed hidden states directly on Qwen-2.5-1.5B at 3-bit, I found this tradeoff : • outlier reconstruction gets dramatically better with rotation • cosine similarity gets better • MSE on the big spikes gets much better • but sparsity gets wrecked I measured 381,999 ghost activations after rotation + quantization: neurons that were effectively quiet in FP16 but became strongly active after the rotated reconstruction. So rotation seems to solve one problem by creating another : ** it prevents hard clipping, but it fills the quiet part of the manifold with false firings. I have tried this till 7b parameters of qwen models bcs of computation limits and for the 20b results I have utilised Gerganov (llama.cpp) recent PR and have explained that in the paper as well.. If anyone wants to poke holes in this, reproduce it, or suggest better sparsity metrics, I'd genuinely appreciate it. • Code: https://github.com/pheonix-delta/llm-isotropic-tradeoff Easy to run On Collab . I have fixed the sampling seeds so that u get exact metrics and read the paper ahead..also in case u want to try with random seeds I have commented what to dlt as well.. • Draft: https://doi.org/10.5281/zenodo.19338651 The same has been shared on the GitHub as well..This isn't the end of my work. I am posting here to get more feedbacks and discussion around it further improve the repo and strengthen the paper. [link] [comments] |
[[R] The loophole in Turboquant: It saves reasoning outliers by permanently polluting the semantic noise floor.
Reddit r/LocalLLaMA / 3/31/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- They provide a runnable Colab-friendly repository and a Zenodo draft to support verification and further community discussion.
Related Articles
v5.5.0
Transformers(HuggingFace)Releases
Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke
Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Inference Engines - A visual deep dive into the layers of an LLM
Dev.to
Surprised by how capable Qwen3.5 9B is in agentic flows (CodeMode)
Reddit r/LocalLLaMA