DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

arXiv cs.AI / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Delta-Aware Quantization (DAQ), a data-free post-training quantization method intended to preserve a post-trained LLM’s acquired knowledge.
  • It argues that standard quantization objectives can disproportionately damage the small-magnitude weight deltas (ΔW) that encode post-training behavior, effectively acting like harmful regularization.
  • DAQ replaces reconstruction-error metrics with two delta-aware objectives—Sign Preservation Rate and Cosine Similarity—to directly optimize the directional fidelity of ΔW using only the base and post-trained weight matrices.
  • In an FP8 pilot study, DAQ reportedly restores style-specific capabilities that standard quantization loses while keeping overall general performance.
  • The approach is positioned as a practical post-training compression technique that avoids needing additional training/calibration data while targeting behavior preservation.

Abstract

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas (\Delta W) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of \Delta W, requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.