Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

arXiv cs.CL / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SlowFast Sampling, a dynamic decoding strategy for diffusion-based LLMs that alternates between exploratory and accelerated stages to address static behavior in prior sampling methods.
It defines three guiding “golden principles”—certainty, convergence, and positional—used to decide when and where tokens can be decoded confidently and efficiently.
The approach is further combined with dLLM-Cache to cut redundant computation during inference.
Experiments across benchmarks report speedups of up to 15.63× on LLaDA with minimal accuracy loss, and up to 34.22× when using the caching integration.
The method is shown to outperform strong autoregressive baselines such as LLaMA3 8B in throughput, highlighting sampling design as a key lever for realizing diffusion LLM efficiency.

Abstract

Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63

\times

speedup on LLaDA with minimal accuracy drop, and up to 34.22

\times

when combined with caching. Notably, our approach outperforms strong autoregressive baselines like LLaMA3 8B in throughput, demonstrating that well-designed sampling can unlock the full potential of dLLMs for fast and high-quality generation.

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck

Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets

Dev.to

[P] Federated Adversarial Learning

Reddit r/MachineLearning

The Inversion Error: Why Safe AGI Requires an Enactive Floor and State-Space Reversibility

Towards Data Science

Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Key Points

Abstract

Related Articles

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck

Agent Self-Discovery: How AI Agents Find Their Own Wallets

[P] Federated Adversarial Learning

The Inversion Error: Why Safe AGI Requires an Enactive Floor and State-Space Reversibility

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer