AI Navigate

Sinkhorn-Drifting Generative Models

arXiv cs.LG / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper establishes a precise link between drifting generative dynamics and Sinkhorn-divergence gradient flows, showing they share a cross-minus-self structure expressed via normalization kernels.
  • In a particle discretization, the drift decomposes into an attractive term toward the target and a repulsive term toward the current model, with Sinkhorn divergence defined via two-sided entropic optimal-transport couplings through Sinkhorn scaling.
  • The work resolves an identifiability gap in prior drifting formulations by proving that zero drift implies the model equals the target, thanks to the definiteness of the Sinkhorn divergence.
  • Experiments show improved stability and one-step generation quality, with FFHQ-ALAE at low temperature achieving a mean FID drop from 187.7 to 37.1 and a mean latent EMD drop from 453.3 to 144.4, while MNIST preserves full class coverage; the approach trades off additional training time for these gains.

Abstract

We establish a theoretical link between the recently proposed "drifting" generative dynamics and gradient flows induced by the Sinkhorn divergence. In a particle discretization, the drift field admits a cross-minus-self decomposition: an attractive term toward the target distribution and a repulsive/self-correction term toward the current model, both expressed via one-sided normalized Gibbs kernels. We show that Sinkhorn divergence yields an analogous cross-minus-self structure, but with each term defined by entropic optimal-transport couplings obtained through two-sided Sinkhorn scaling (i.e., enforcing both marginals). This provides a precise sense in which drifting acts as a surrogate for a Sinkhorn-divergence gradient flow, interpolating between one-sided normalization and full two-sided Sinkhorn scaling. Crucially, this connection resolves an identifiability gap in prior drifting formulations: leveraging the definiteness of the Sinkhorn divergence, we show that zero drift (equilibrium of the dynamics) implies that the model and target measures match. Experiments show that Sinkhorn drifting reduces sensitivity to kernel temperature and improves one-step generative quality, trading off additional training time for a more stable optimization, without altering the inference procedure used by drift methods. These theoretical gains translate to strong low-temperature improvements in practice: on FFHQ-ALAE at the lowest temperature setting we evaluate, Sinkhorn drifting reduces mean FID from 187.7 to 37.1 and mean latent EMD from 453.3 to 144.4, while on MNIST it preserves full class coverage across the temperature sweep. Project page: https://mint-vu.github.io/SinkhornDrifting/