DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

arXiv cs.AI / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Delta-Aware Quantization (DAQ), a data-free post-training quantization method intended to preserve a post-trained LLM’s acquired knowledge.
It argues that standard quantization objectives can disproportionately damage the small-magnitude weight deltas (ΔW) that encode post-training behavior, effectively acting like harmful regularization.
DAQ replaces reconstruction-error metrics with two delta-aware objectives—Sign Preservation Rate and Cosine Similarity—to directly optimize the directional fidelity of ΔW using only the base and post-trained weight matrices.
In an FP8 pilot study, DAQ reportedly restores style-specific capabilities that standard quantization loses while keeping overall general performance.
The approach is positioned as a practical post-training compression technique that avoids needing additional training/calibration data while targeting behavior preservation.

Abstract

We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas (

\Delta W

) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces reconstruction-based objectives with two delta-aware metrics -- Sign Preservation Rate and Cosine Similarity -- that directly optimize for directional fidelity of

\Delta W

, requiring only the base and post-trained weight matrices. In a pilot FP8 study, DAQ recovers style-specific capabilities lost under standard quantization while maintaining general performance.

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Reddit r/artificial

Scaffolded Test-First Prompting: Get Correct Code From the First Run

Dev.to

DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression

Key Points

Abstract

Related Articles

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Scaffolded Test-First Prompting: Get Correct Code From the First Run

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer