S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

arXiv cs.CL / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

S2D2 is a training-free self-speculative decoding method for block-diffusion LLMs that improves the accuracy-speed tradeoff in the few-step regime where confidence-thresholding is brittle.
It leverages the insight that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to serve as both drafter and verifier.
During decoding, S2D2 inserts a lightweight speculative verification step with routing policies that decide when verification is cost-effective.
Experiments on three mainstream block-diffusion families show consistent gains over confidence-threshold baselines, including up to 4.7× speedup on SDAR with accuracy improvements up to 4.5 points.
For LLaDA2.1-Mini, S2D2 complements built-in self-correction and can deliver up to 4.4× faster decoding than a static baseline with slightly higher accuracy.

Abstract

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to

4.7\times

speedup over autoregressive decoding, and up to

1.57\times

over a tuned dynamic decoding baseline while improving accuracy by up to

4.5

points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is

4.4\times

faster than the static baseline with slightly higher accuracy.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Sector HQ Daily AI Intelligence - March 27, 2026

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Sector HQ Daily AI Intelligence - March 27, 2026

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer