Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning

arXiv cs.LG / 4/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Classifier-free guidance (CFG) is effective for diffusion models but is constrained by high sampling costs, motivating alternatives like consistency models (CMs) that sample in one or a few steps.
  • The paper argues that existing CM guidance approaches typically require distillation from a separate diffusion model (DM) teacher, limiting CM guidance to “consistency distillation” settings.
  • It proposes Joint Flow Distribution Learning (JFDL), a lightweight post-hoc alignment method that enables guidance in a pre-trained CM by treating the CM as an ODE solver.
  • The authors verify via normality tests that the Gaussian variance implied by unconditional vs. conditional velocity-field noise holds under the method’s assumptions.
  • Experiments show that JFDL adds an adjustable “guidance knob” to CMs and improves generation quality (lower FID) on CIFAR-10 and ImageNet 64x64, enabling effective guidance without a DM teacher for the first time in this framing.

Abstract

Classifier-free Guidance (CFG) lets practitioners trade-off fidelity against diversity in Diffusion Models (DMs). The practicality of CFG is however hindered by DMs sampling cost. On the other hand, Consistency Models (CMs) generate images in one or a few steps, but existing guidance methods require knowledge distillation from a separate DM teacher, limiting CFG to Consistency Distillation (CD) methods. We propose Joint Flow Distribution Learning (JFDL), a lightweight alignment method enabling guidance in a pre-trained CM. With a pre-trained CM as an ordinary differential equation (ODE) solver, we verify with normality tests that the variance-exploding noise implied by the velocity fields from unconditional and conditional distributions is Gaussian. In practice, JFDL equips CMs with the familiar adjustable guidance knob, yielding guided images with similar characteristics to CFG. Applied to an original Consistency Trained (CT) CM that could only do conditional sampling, JFDL unlocks guided generation and reduces FID on both CIFAR-10 and ImageNet 64x64 datasets. This is the first time that CMs are able to receive effective guidance post-hoc without a DM teacher, thus, bridging a key gap in current methods for CMs.