Attention-Based Sampler for Diffusion Language Models

arXiv cs.CL / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses limitations of autoregressive decoding by studying how diffusion-based LLMs can choose decoding order beyond token-level signals.
It provides a theoretical result that approximately maximizes sequence log-likelihood by decoding tokens in descending order of attention-matrix column sums.
Based on this theory, the authors introduce Attn-Sampler, a training-free attention-guided decoding algorithm intended to improve generation quality over greedy approaches.
To make the method practical and faster, they propose a block attention approximation and dynamic attention thresholding to accelerate decoding while preserving benefits.
Experiments on multiple benchmarks show improved generation quality and increased decoding parallelism compared with existing decoding strategies.

Abstract

Auto-regressive models (ARMs) have established a dominant paradigm in language modeling. However, their strictly sequential decoding paradigm imposes fundamental constraints on both inference efficiency and modeling flexibility. To address these limitations, diffusion-based large language models (dLLMs) have been proposed, offering the potential for parallel decoding and flexible language modeling. Despite these advantages, current dLLMs decoding strategies rely primarily on token level information, which fails to account for global sequence structure and often yields suboptimal results. In this paper, we study the decoding order selection problem from the perspective of log-likelihood maximization. We theoretically demonstrate that optimal sequence likelihood can be approximately achieved by decoding tokens in descending order of their attention matrix column sums. This finding provides a principled justification for attention-guided decoding and offers a theoretically grounded alternative to greedy search. We instantiate this theoretical insight in a new training-free decoding algorithm, termed Attn-Sampler, and further propose a block attention approximation and dynamic attention thresholding for practical acceleration. Extensive experiments across multiple benchmarks validate the effectiveness of our proposed method, demonstrating that it achieves superior generation quality while enhancing the decoding parallelism.

Black Hat Asia

AI Business

I built the missing piece of the MCP ecosystem

Dev.to

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Dev.to

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

Dev.to

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

Dev.to

Attention-Based Sampler for Diffusion Language Models

Key Points

Abstract

Related Articles

Black Hat Asia

I built the missing piece of the MCP ecosystem

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer