Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization
arXiv cs.CL / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper finds that for extreme 2-bit additive quantization of LLMs, catastrophic failures are primarily driven by poor codebook initialization rather than by later search or fine-tuning alone.
- It shows greedy sequential initialization often lands the model in bad optimization regions that beam search and PV-tuning cannot reliably recover from, especially at tighter compression rates.
- Using an analysis based on the representational ratio (ρ̂ = N/KM), the authors demonstrate how the severity of the initialization bottleneck scales with codebook capacity versus weight-group structure.
- They propose OA-EM, an output-aware EM initialization method that uses Hessian-weighted Mahalanobis distance, which consistently yields better quantized-model quality after PV-tuning.
- Across multiple architectures (Llama 3.2 3B, Llama 3.1 8B, Qwen 2.5 3B) and compression settings, OA-EM improves the quality-compute tradeoff and can prevent perplexity from degrading by orders of magnitude at 2 bpp.



