Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier

arXiv cs.CV / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Uni-Classifier (Uni-C), a plug-and-play module that uses video diffusion priors to guide upstream generative models’ denoising steps so their outputs better match downstream model input distributions.
  • It targets a common workflow problem where chaining different generative models (e.g., 2D-to-video or 2D-to-3D pipelines) causes quality loss due to distributional mismatch.
  • Uni-C is designed to work both as part of multi-model pipelines (to improve end-to-end generation) and as a standalone enhancer for individual generative models.
  • Experiments across video and 3D generation tasks report consistent improvements in generation quality, suggesting strong generalization and versatility.

Abstract

In practical AI workflows, complex tasks often involve chaining multiple generative models, such as using a video or 3D generation model after a 2D image generator. However, distributional mismatches between the output of upstream models and the expected input of downstream models frequently degrade overall generation quality. To address this issue, we propose Uni-Classifier (Uni-C), a simple yet effective plug-and-play module that leverages video diffusion priors to guide the denoising process of preceding models, thereby aligning their outputs with downstream requirements. Uni-C can also be applied independently to enhance the output quality of individual generative models. Extensive experiments across video and 3D generation tasks demonstrate that Uni-C consistently improves generation quality in both workflow-based and standalone settings, highlighting its versatility and strong generalization capability.