From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes why reducing LLM precision from 4-bit to 2-bit during post-training quantization often causes a sudden “performance cliff.”
It identifies two qualitatively different PTQ failure modes: Signal Degradation, driven by cumulative information/precision loss while computation patterns still function, and Computation Collapse, where key components stop working and corrupt signals early in the network.
The authors propose mechanism-aware interventions and show that training-free repair can mitigate Signal Degradation.
However, the same kind of repair does not work for Computation Collapse, implying that this issue requires structural reconstruction rather than simple compensation.
The work provides a systematic diagnostic framework to classify PTQ failures and choose appropriate mitigation strategies.

Abstract

Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.'' It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes: Signal Degradation, where the computational patterns remain intact but information precision is impaired by cumulative error; and Computation Collapse, where key components fail to function, preventing correct information processing and destroying the signal in the early layers. Guided by this diagnosis, we conduct mechanism-aware interventions, demonstrating that targeted, training-free repair can mitigate Signal Degradation, but remains ineffective for Computation Collapse. Our findings provide a systematic diagnostic framework for PTQ failures and suggest that addressing Computation Collapse requires structural reconstruction rather than mere compensation.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Why use an AI gateway at all?

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

Dev.to

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Why use an AI gateway at all?

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer