Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

arXiv cs.CL / 3/13/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

Introduces Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that distills a few-step student from a pre-trained diffusion LLM to enable fast generation.
Builds on integral KL-divergence minimization and adds grouped reward normalization, intermediate-state matching, and a reward-guided ancestral sampler to improve training stability, model coverage, and inference quality.
Demonstrates that the distilled model matches or surpasses its diffusion teacher and the GPT-2 baseline, while delivering up to 64x acceleration and more than 20x reduction in training time compared with prior distillation methods.
On OpenWebText, it reports perplexity improvements from 62.2 (8 NFEs) to 18.4 (128 NFEs), illustrating efficient performance across generation settings and robustness in downstream tasks and protein sequence generation.
Overall, the work argues that DiDi-Instruct enables efficient and effective distillation for language generation with practical impact on speed and resource use.

Abstract

Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained diffusion large language model (dLLM) and distills a few-step student for fast generation. The model distilled with DiDi-Instruct matches or surpasses its dLLM teacher and the GPT-2 baseline while providing up to 64

\times

acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which leads to a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler to improve training stability, model coverage, and inference quality. On the OpenWebText benchmark, DiDi-Instruct achieves perplexity ranging from 62.2 (8 NFEs) to 18.4 (128 NFEs), outperforming prior accelerated dLLMs and the GPT-2 baseline. These gains incur a negligible entropy loss (around

1

%) and reduce additional training wall-clock time by more than

20\times

compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, downstream task evaluations, and unconditional protein sequence generation. In conclusion, DiDi-Instruct enables efficient and effective distillation for language generation in the blink of an eye.

What 81,000 people want from AI

Anthropic News

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

日経XTECH

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

日経XTECH

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

日経XTECH

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か 米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも