AI Navigate

The Quadratic Geometry of Flow Matching: Semantic Granularity Alignment for Text-to-Image Synthesis

arXiv cs.CV / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes the optimization dynamics of generative fine-tuning under Flow Matching, showing the standard MSE objective forms a Quadratic Form governed by a dynamically evolving Neural Tangent Kernel (NTK).
  • It uncovers a latent Data Interaction Matrix with diagonal terms representing independent sample learning and off-diagonal terms encoding residual cross-feature interference, highlighting gradient interactions that are not explicitly controlled.
  • To address this, it proposes Semantic Granularity Alignment (SGA), which intentionally modulates the vector residual field to mitigate gradient conflicts during training.
  • Experiments on DiT and U-Net indicate that SGA improves the efficiency-quality trade-off by accelerating convergence and preserving structural integrity of the generated images.

Abstract

In this work, we analyze the optimization dynamics of generative fine-tuning. We observe that under the Flow Matching framework, the standard MSE objective can be formulated as a Quadratic Form governed by a dynamically evolving Neural Tangent Kernel (NTK). This geometric perspective reveals a latent Data Interaction Matrix, where diagonal terms represent independent sample learning and off-diagonal terms encode residual correlation between heterogeneous features. Although standard training implicitly optimizes these cross-term interferences, it does so without explicit control; moreover, the prevailing data-homogeneity assumption may constrain the model's effective capacity. Motivated by this insight, we propose Semantic Granularity Alignment (SGA), using Text-to-Image synthesis as a testbed. SGA engineers targeted interventions in the vector residual field to mitigate gradient conflicts. Evaluations across DiT and U-Net architectures confirm that SGA advances the efficiency-quality trade-off by accelerating convergence and improving structural integrity.