Stability-Weighted Decoding for Diffusion Language Models

arXiv cs.CL / 4/21/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that diffusion LLM decoding can fail when it uses static confidence scores from a single denoising step, causing premature unmasking of tokens that are unstable over time.
It introduces a theoretical measure of temporal instability using the KL divergence between consecutive token prediction distributions, showing a lower bound on how much mutual information the token can have with the remaining masked context.
Based on this, the authors propose Stability-Weighted Decoding (SWD), a training-free, plug-and-play method that penalizes temporally unstable tokens via stability-aware token scoring.
Experiments on code generation and mathematical reasoning benchmarks report consistent accuracy gains across multiple scoring metrics and token selection policies.
SWD also shows strong robustness under faster generation settings (varying acceleration ratios), retaining a sizable advantage over standard baselines.

Abstract

Diffusion large language models (dLLMs) enable parallel text generation by iteratively denoising a fully masked sequence, unmasking a subset of masked tokens at each step. Existing decoding strategies rely on static confidence metrics computed at a single denoising step, ignoring temporal history and often leading to premature unmasking of unstable tokens. In this work, we theoretically establish that a token's temporal instability, quantified by the KL divergence between consecutive prediction distributions, provides a strict lower bound on its mutual information with the remaining masked context, indicating that temporally unstable tokens are inherently unsafe to unmask. Based on this insight, we propose Stability-Weighted Decoding (SWD), a training-free, plug-and-play strategy that incorporates temporal stability into token scoring and acts as a universal modulator for arbitrary score-based decoding policies. Experiments on code generation and mathematical reasoning benchmarks demonstrate that SWD consistently improves generation accuracy across representative scoring metrics and selection policies, and exhibits exceptional robustness, maintaining a significant performance lead over standard baselines across varying acceleration ratios.

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Dev.to

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

Reddit r/MachineLearning

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

Reddit r/artificial

Stability-Weighted Decoding for Diffusion Language Models

Key Points

Abstract

Related Articles

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer