Accelerating Speculative Decoding with Block Diffusion Draft Trees

arXiv cs.CL / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Speculative decoding speeds up autoregressive language models by having a lightweight drafter propose multiple future tokens that the target model then verifies in parallel.
DFlash introduces a block diffusion drafter that can produce an entire draft block in a single forward pass, achieving state-of-the-art speculative decoding results.
The paper notes that vanilla DFlash still verifies only one drafted trajectory per round, which can restrict the achievable acceptance length.
It proposes DDTree (Diffusion Draft Tree), which builds a draft tree from the per-position distributions of a block diffusion drafter and selects likely continuations under a fixed node budget using a best-first heap strategy.
DDTree verifies the resulting tree efficiently in a single target-model forward pass via an ancestor-only attention mask, and is positioned as a leading speculative-decoding approach built on DFlash.

Abstract

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's output. The resulting tree is verified efficiently in a single target model forward pass using an ancestor-only attention mask. Because DDTree builds on DFlash, a leading draft model for speculative decoding, these gains place DDTree among the leading approaches to speculative decoding.

Black Hat Asia

AI Business

Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention

Dev.to

I scanned every major vibe coding tool for security. None scored above 90.

Dev.to

I Finally Checked What My AI Coding Tools Actually Cost. The Number Made No Sense.

Dev.to

Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?

Reddit r/artificial

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Key Points

Abstract

Related Articles

Black Hat Asia

Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention

I scanned every major vibe coding tool for security. None scored above 90.

I Finally Checked What My AI Coding Tools Actually Cost. The Number Made No Sense.

Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer