TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation

arXiv cs.CV / 4/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces TurboTalk, a progressive distillation framework designed to convert a multi-step audio-driven talking-avatar diffusion model into a single-step generator.
It uses a two-stage approach: first applying Distribution Matching Distillation to train a stable 4-step “student,” then using adversarial distillation to progressively reduce denoising steps from 4 down to 1.
To prevent training instability during extreme step reduction, TurboTalk adds progressive timestep sampling and a self-compare adversarial objective that stabilizes the distillation process.
Experiments report single-step video generation with a claimed 120× inference speedup while maintaining high generation quality.
The work targets practical deployment constraints by substantially reducing computational overhead inherent in multi-step denoising pipelines.

Abstract

Existing audio-driven video digital human generation models rely on multi-step denoising, resulting in substantial computational overhead that severely limits their deployment in real-world settings. While one-step distillation approaches can significantly accelerate inference, they often suffer from training instability. To address this challenge, we propose TurboTalk, a two-stage progressive distillation framework that effectively compresses a multi-step audio-driven video diffusion model into a single-step generator. We first adopt Distribution Matching Distillation to obtain a strong and stable 4-step student, and then progressively reduce the denoising steps from 4 to 1 through adversarial distillation. To ensure stable training under extreme step reduction, we introduce a progressive timestep sampling strategy and a self-compare adversarial objective that provides an intermediate adversarial reference that stabilizes progressive distillation. Our method achieve single-step generation of video talking avatar, boosting inference speed by 120 times while maintaining high generation quality.

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

Reddit r/artificial

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

🌱 Green Habit Tracker

Dev.to

TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation

Key Points

Abstract

Related Articles

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

🌱 Green Habit Tracker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer