Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
arXiv cs.LG / 5/4/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper introduces Activation Residual Hessian Quantization (ARHQ), a post-training weight-splitting method aimed at reducing error propagation in low-bit activation/weight quantization.
- ARHQ builds an input-side residual Hessian from activation quantization residuals (G_x) to analytically find error-sensitive weight directions and route them into a high-precision, low-rank branch.
- It uses a closed-form truncated SVD on a scaled weight matrix (W G_x^{1/2}) to perform the split efficiently.
- Experiments on Qwen3-4B-Thinking-2507 show improved layer-wise SNR and better preservation of downstream reasoning performance on ZebraLogic under aggressive quantization.
- The authors provide an open-source implementation at the linked GitHub repository.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.

