周波数を考慮したフローマッチングによる高品質画像生成

arXiv cs.CV / 2026/4/20

📰 ニュースDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

要点

フローマッチングはガウスノイズで段階的に破壊された過程を逆に学習することで、現実的な画像生成を可能にする枠組みだが、潜在空間で注入されるノイズが周波数成分へ非一様に影響するため推論中は高周波（細部）が遅れて生成されやすい。
本論文では Frequency-Aware Flow Matching（FreqFlow）として、周波数を考慮した時間依存の適応的重み付けをフローマッチングへ明示的に組み込み、サンプリング全体で低周波の構造と高周波の詳細をより効果的に引き出すことを提案している。
FreqFlow は2つのブランチを用い、周波数ブランチで低・高周波成分を別々に捉えつつ、周波数ブランチの出力に導かれた潜在空間の空間ブランチで画像を合成する構成になっている。
ImageNet-256（クラス条件付き）生成では FID 1.38 を達成し、先行の拡散モデル DiT とフローマッチングモデル SiT をそれぞれ FID で 0.79、0.58 上回っている。
提案手法の再現や追加検証が可能なように、著者はGitHubでコードを公開している。

Abstract

Flow matching models have emerged as a powerful framework for realistic image generation by learning to reverse a corruption process that progressively adds Gaussian noise. However, because noise is injected in the latent domain, its impact on different frequency components is non-uniform. As a result, during inference, flow matching models tend to generate low-frequency components (global structure) in the early stages, while high-frequency components (fine details) emerge only later in the reverse process. Building on this insight, we propose Frequency-Aware Flow Matching (FreqFlow), a novel approach that explicitly incorporates frequency-aware conditioning into the flow matching framework via time-dependent adaptive weighting. We introduce a two-branch architecture: (1) a frequency branch that separately processes low- and high-frequency components to capture global structure and refine textures and edges, and (2) a spatial branch that synthesizes images in the latent domain, guided by the frequency branch's output. By explicitly integrating frequency information into the generation process, FreqFlow ensures that both large-scale coherence and fine-grained details are effectively modeled low-frequency conditioning reinforces global structure, while high-frequency conditioning enhances texture fidelity and detail sharpness. On the class-conditional ImageNet-256 generation benchmark, our method achieves state-of-the-art performance with an FID of 1.38, surpassing the prior diffusion model DiT and flow matching model SiT by 0.79 and 0.58 FID, respectively. Code is available at https://github.com/OliverRensu/FreqFlow.

Appleが「声なき入力」に約3000億円を投じた理由｜Q.ai買収とAirPods Pro 3の接点

Innovatopia

Claude Opus 4.7でトークン消費量がどれだけ増えたか可視化するサイトが登場、同じ入力で4.6の2倍消費する実例も

GIGAZINE

北京ヒューマノイドロボットハーフマラソンで優勝記録更新、CursorがバリュエーションUS$50Bでの調達協議など：2026-04-20 AI動向まとめ

Qiita

LINEやYahoo!検索に謎のロボットアイコン登場、いったい何者？　正体は……

ITmedia AI+

スクエニ、マンガの「写植指定」をAIで効率化　試用編集者の100％が「継続利用したい」

ITmedia AI+

周波数を考慮したフローマッチングによる高品質画像生成

要点

Abstract

関連記事

Appleが「声なき入力」に約3000億円を投じた理由｜Q.ai買収とAirPods Pro 3の接点

Claude Opus 4.7でトークン消費量がどれだけ増えたか可視化するサイトが登場、同じ入力で4.6の2倍消費する実例も

北京ヒューマノイドロボットハーフマラソンで優勝記録更新、CursorがバリュエーションUS$50Bでの調達協議など：2026-04-20 AI動向まとめ

LINEやYahoo!検索に謎のロボットアイコン登場、いったい何者？　正体は……

スクエニ、マンガの「写植指定」をAIで効率化　試用編集者の100％が「継続利用したい」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

Abstract

関連記事

Appleが「声なき入力」に約3000億円を投じた理由｜Q.ai買収とAirPods Pro 3の接点

Claude Opus 4.7でトークン消費量がどれだけ増えたか可視化するサイトが登場、同じ入力で4.6の2倍消費する実例も

北京ヒューマノイドロボットハーフマラソンで優勝記録更新、CursorがバリュエーションUS$50Bでの調達協議など：2026-04-20 AI動向まとめ

LINEやYahoo!検索に謎のロボットアイコン登場、いったい何者？ 正体は……

スクエニ、マンガの「写植指定」をAIで効率化 試用編集者の100％が「継続利用したい」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

LINEやYahoo!検索に謎のロボットアイコン登場、いったい何者？　正体は……

スクエニ、マンガの「写植指定」をAIで効率化　試用編集者の100％が「継続利用したい」