AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces AdvDMD, a new method that unifies Distribution Matching Distillation (DMD) with reinforcement learning to improve few-step diffusion generation quality.
AdvDMD uses the adversarially trained discriminator from DMD2 as a reward model, scoring generated images low and real images high to drive better sampling.
The reward model is trained using both intermediate and final denoising states and is updated online with the distilled model to reduce issues like reward hacking.
A unified SDE backward simulation and a tailored training schedule are used to stabilize and improve training efficiency for the DMD+RL process.
Experiments show that 4-step AdvDMD beats a 40-step baseline for SD3.5 on DPG-Bench, improves SD3 on GenEval, and that 2-step AdvDMD outperforms TwinFlow on Qwen-Image.

Abstract

Diffusion models offer superior generation quality at the expense of extensive sampling steps. Distillation methods, with Distribution Matching Distillation (DMD) as a popular example, can mitigate this issue, but performance degradation remains pronounced when sampling steps are limited. Reinforcement learning (RL) has been leveraged to improve the few-step generation quality during distillation, with the potential to even surpass the performance of the teacher model. However, existing approaches are combinatorial in nature, merely integrating an RL process with the distillation process, which introduces unnecessary complexities. To address this gap, we propose AdvDMD, a method that seamlessly unifies DMD distillation and RL. Specifically, AdvDMD employs the adversarially trained discriminator from DMD2 as the reward model, which assigns low scores to generated images and high scores to real ones. It is trained on both intermediate and final states of the denoising process and updated online with the distilled model, enabling a holistic supervision of the sampling trajectories and mitigating reward hacking. We adopt a unified SDE backward simulation and a different training schedule for DMD and RL to enable a more stable and efficient training. Experimental results demonstrate that the 4-step AdvDMD outperforms the original 40-step model for SD3.5 on DPG-Bench, while achieving significant performance gains for SD3 on the GenEval. On Qwen-Image, our 2-step AdvDMD achieves superior performance over TwinFlow.

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

Dev.to

GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

Reddit r/LocalLLaMA

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training

THE DECODER

AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

Key Points

Abstract

Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer