On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

arXiv cs.LG / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how post-training quantization (PTQ) techniques—specifically GPTQ and a modified Hessian-Aware Quantization (HAWQ)—affect diffusion-based coding LLMs under low-bit settings.
Experiments compare a diffusion coding LLM (CoDA) against its auto-regressive counterpart (Qwen3-1.7B) using a standardized evaluation pipeline.
CoDA shows notably better robustness at very low bitwidths (2–4 bits), with smaller accuracy drops on HumanEval and MBPP than the auto-regressive model.
The authors report that mixed-precision configurations derived from HAWQ enable smoother trade-offs among accuracy, latency, and memory, supporting more efficient deployment.
Overall, the findings suggest diffusion LLMs may be more resilient to quantization, improving feasibility for cost- and memory-constrained inference.

Abstract

Auto-regressive Large Language Models (LLMs) achieve strong performance on coding tasks, but incur high memory and inference costs. Diffusion-based language models (d-LLMs) offer bounded inference cost via iterative denoising, but their behavior under post-training quantization (PTQ) has been sparsely explored. We investigate the application and robustness of PTQ techniques, specifically GPTQ and a modified Hessian-Aware Quantization (HAWQ) algorithm, on a diffusion-based coding LLM (CoDA) and observe that these methods applied to CoDA exhibit greater robustness at low bitwidths compared to Qwen3-1.7B, its auto-regressive counterpart, under a standardized evaluation pipeline. We find that in our setup, CoDA exhibits greater robustness at low bitwidths (2-4 bits), with smaller accuracy degradation across HumanEval and MBPP benchmarks. Additionally, mixed-precision configurations derived from HAWQ provide smooth trade-offs across accuracy, latency, and memory. The results suggest that diffusion LLMs may offer advantages for efficient deployment due to more quantization-resilience.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Why use an AI gateway at all?

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

Dev.to

On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Why use an AI gateway at all?

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer