Local Mechanisms of Compositional Generalization in Conditional Diffusion

Apple Machine Learning Journal / 4/28/2026

💬 OpinionModels & Research

Key Points

  • The paper examines how conditional diffusion models can perform compositional generalization, but notes that the underlying mechanisms are still poorly understood.
  • It studies “length generalization,” where a model generates images containing more objects than it saw during training, as a concrete test of compositionality.
  • Using a controlled CLEVR setup, the authors find length generalization succeeds in some scenarios but fails in others, implying that models do not always learn the full compositional structure.
  • The work then investigates the model-side “local mechanisms” that may explain when and why compositional generalization emerges in conditional diffusion.
  • Overall, the findings suggest compositional generalization in diffusion is partial and contingent on factors in training or structure, rather than guaranteed behavior.
Conditional diffusion models appear capable of compositional generalization, i.e., generating convincing samples for out-of-distribution combinations of conditioners, but the mechanisms underlying this ability remain unclear. To make this concrete, we study length generalization, the ability to generate images with more objects than seen during training. In a controlled CLEVR setting (Johnson et al.,2017), we find that length generalization is achievable in some cases but not others, suggesting that models only sometimes learn the underlying compositional structure. We then investigate…

Continue reading this article on the original site.

Read original →