Representation Fr\'echet Loss for Visual Generation

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

Key Points

  • The paper argues that Fréchet Distance (FD), previously viewed as impractical for training, can be optimized by computing FD with a large population estimate while using a smaller batch for gradient computation.
  • It introduces “FD-loss,” and shows that training generators with FD-loss improves visual quality across different representation spaces, including strong results in the Inception feature space.
  • The method can turn multi-step generators into effective one-step generators without relying on teacher distillation, adversarial training, or per-sample targets.
  • The authors find that Inception FID can misrank sample quality, motivating a multi-representation metric called FDr^k for more reliable evaluation.
  • Overall, the work encourages using distributional distances in multiple representation spaces both as training objectives and as evaluation metrics for generative models.

Abstract

We show that Fr\'echet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.