Representation Fr\'echet Loss for Visual Generation

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that Fréchet Distance (FD), previously viewed as impractical for training, can be optimized by computing FD with a large population estimate while using a smaller batch for gradient computation.
It introduces “FD-loss,” and shows that training generators with FD-loss improves visual quality across different representation spaces, including strong results in the Inception feature space.
The method can turn multi-step generators into effective one-step generators without relying on teacher distillation, adversarial training, or per-sample targets.
The authors find that Inception FID can misrank sample quality, motivating a multi-representation metric called FDr^k for more reliable evaluation.
Overall, the work encourages using distributional distances in multiple representation spaces both as training objectives and as evaluation metrics for generative models.

Abstract

We show that Fr\'echet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr

^k

, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Representation Fr\'echet Loss for Visual Generation

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer