Accelerating Speculative Decoding with Block Diffusion Draft Trees
arXiv cs.CL / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Speculative decoding speeds up autoregressive language models by having a lightweight drafter propose multiple future tokens that the target model then verifies in parallel.
- DFlash introduces a block diffusion drafter that can produce an entire draft block in a single forward pass, achieving state-of-the-art speculative decoding results.
- The paper notes that vanilla DFlash still verifies only one drafted trajectory per round, which can restrict the achievable acceptance length.
- It proposes DDTree (Diffusion Draft Tree), which builds a draft tree from the per-position distributions of a block diffusion drafter and selects likely continuations under a fixed node budget using a best-first heap strategy.
- DDTree verifies the resulting tree efficiently in a single target-model forward pass via an ancestor-only attention mask, and is positioned as a leading speculative-decoding approach built on DFlash.
Related Articles

Black Hat Asia
AI Business
Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention
Dev.to
I scanned every major vibe coding tool for security. None scored above 90.
Dev.to
I Finally Checked What My AI Coding Tools Actually Cost. The Number Made No Sense.
Dev.to
Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?
Reddit r/artificial