PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

arXiv cs.CL / 4/1/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

PolarQuant is a post-training LLM weight quantization method designed for near-lossless compression by reshaping weight distributions before quantization.
The approach normalizes weights block-wise to a unit hypersphere, applies a Walsh-Hadamard rotation to make coordinates approximately Gaussian, then quantizes using centroids matched to that Gaussian distribution.
Ablation results show Hadamard rotation alone drives about 98% of the quality gains, improving Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (very close to FP16, with Δ = +0.03) without calibration data.
PolarQuant also serves as a preprocessing step that improves downstream INT4 quantization (torchao INT4), achieving lower perplexity (6.56 vs 6.68) while keeping strong throughput (43.1 tok/s at ~6.5 GB VRAM).
The authors provide public code and models, indicating the method is intended to be directly testable and reusable in compression/quantization pipelines.

Abstract

We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.