AI Navigate

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Introduces NSDS, a data-free, calibration-free layer-wise mixed-precision quantization framework that uses numerical and structural dual-sensitivity to guide bit allocation.
  • Mechanistically decomposes each layer into distinct operational roles and measures sensitivity from both numerical and structural perspectives.
  • Aggregates the dual-sensitivity scores into a unified layer-wise metric via MAD-Sigmoid and Soft-OR to drive bit allocation.
  • Demonstrates superior performance across diverse models and downstream tasks without relying on calibration data.
  • Addresses a key limitation of prior methods that treat all intra-layer weight modules uniformly and rely on a single numerical property.

Abstract

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address this, we propose NSDS, a novel calibration-free LMPQ framework driven by Numerical and Structural Dual-Sensitivity. Specifically, it first mechanistically decomposes each layer into distinct operational roles and quantifies their sensitivity from both numerical and structural perspectives. These dual-aspect scores are then aggregated into a unified layer-wise metric through a robust aggregation scheme based on MAD-Sigmoid and Soft-OR to guide bit allocation. Extensive experiments demonstrate that NSDS consistently achieves superior performance compared to various baselines across diverse models and downstream tasks, without relying on any calibration data.