SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

arXiv cs.CV / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

SynQ is a synthesis-aware fine-tuning framework for zero-shot quantization that quantizes pre-trained models without access to training data.
It overcomes three main ZSQ challenges by using a low-pass filter to reduce noise in synthetic samples, aligning the quantized model's class activation maps with the pre-trained model, and using soft labels for hard samples to mitigate misguidance from erroneous labels.
Extensive experiments show that SynQ achieves state-of-the-art accuracy compared with existing ZSQ methods.
By enabling accurate quantization without data, SynQ facilitates deploying compressed models on privacy- or security-constrained edge devices.

Abstract

How can we accurately quantize a pre-trained model without any data? Quantization algorithms are widely used for deploying neural networks on resource-constrained edge devices. Zero-shot Quantization (ZSQ) addresses the crucial and practical scenario where training data are inaccessible for privacy or security reasons. However, three significant challenges hinder the performance of existing ZSQ methods: 1) noise in the synthetic dataset, 2) predictions based on off-target patterns, and the 3) misguidance by erroneous hard labels. In this paper, we propose SynQ (Synthesis-aware Fine-tuning for Zero-shot Quantization), a carefully designed ZSQ framework to overcome the limitations of existing methods. SynQ minimizes the noise from the generated samples by exploiting a low-pass filter. Then, SynQ trains the quantized model to improve accuracy by aligning its class activation map with the pre-trained model. Furthermore, SynQ mitigates misguidance from the pre-trained model's error by leveraging only soft labels for difficult samples. Extensive experiments show that SynQ provides the state-of-the-art accuracy, over existing ZSQ methods.

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

Dev.to

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

Reddit r/LocalLLaMA

composer 2 is just Kimi K2.5 with RL?????

Reddit r/LocalLLaMA

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Reddit r/MachineLearning

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Key Points

Abstract

Related Articles

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems