Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

arXiv cs.CL / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes why diffusion LLMs, which can decode tokens in arbitrary orders, often lose generation quality under random-order decoding despite their theoretical flexibility for exploration.
It explains the “quality–exploration dilemma”: low-confidence remasking boosts single-sample quality (e.g., Pass@1) by favoring confident tokens, but it reduces exploration and can cap multi-sample improvements (e.g., Pass@k).
The authors unify this dilemma by showing low-confidence remasking improves a myopic quality proxy while provably restricting the entropy of the resulting sequence distribution.
To address the limitation, they derive an optimal distribution that explicitly balances quality and exploration and propose an Independent Metropolis–Hastings sampler to approximate that target during decoding.
Experiments on reasoning benchmarks (MATH500, AIME24/25, HumanEval, MBPP) indicate the proposed sampling approach improves the exploration–quality tradeoff versus both random decoding and low-confidence remasking.

Abstract

Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@

1

) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@

k

), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.

Why I built an AI assistant that doesn't know who you are

Dev.to

DenseNet Paper Walkthrough: All Connected

Towards Data Science

Meta Adaptive Ranking Model: What Instagram Advertisers Gain in 2026 | MKDM

Dev.to

The Facebook insider building content moderation for the AI era

TechCrunch

Qwen3.5 vs Gemma 4: Benchmarks vs real world use?

Reddit r/LocalLLaMA

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

Key Points

Abstract

Related Articles

Why I built an AI assistant that doesn't know who you are

DenseNet Paper Walkthrough: All Connected

Meta Adaptive Ranking Model: What Instagram Advertisers Gain in 2026 | MKDM

The Facebook insider building content moderation for the AI era

Qwen3.5 vs Gemma 4: Benchmarks vs real world use?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer