Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Dependency-Aware Parallel Decoding (DAPD) is proposed for diffusion LLMs to enable parallel token unmasking by constructing a conditional dependency graph via self-attention.
The method is training-free and does not require auxiliary models or retraining, reducing complexity of parallel decoding.
At each iteration, DAPD treats token interactions as edges in a graph, with independent sets of tokens selected to be unmasked in parallel.
Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off and enables more globally distributed parallel updates that better exploit any-order generation.
This approach could lead to more efficient inference for diffusion-based LLMs and influence future decoding strategies.

Abstract

Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

Reddit r/LocalLLaMA

Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Key Points

Abstract

Related Articles

The programming passion is melting

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer