Diffusion Models with Double Guidance: Generate with aggregated datasets

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the difficulty of training conditional generative diffusion models when large datasets are expensive and annotations are inconsistent across sources, causing “block-wise” missing conditions after naive dataset merging.
  • It proposes “Diffusion Model with Double Guidance,” which enables precise conditional generation even when the training data never contains all conditions together.
  • The method aims to preserve rigorous control over multiple attributes without requiring joint annotations, improving controllability in practical missing-condition scenarios.
  • Experiments on molecular and image generation show the approach outperforms baselines in both matching target conditional distributions and maintaining controllability under missing-condition settings.

Abstract

Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a common strategy. However, the sets of attributes across datasets are often inconsistent, and their naive concatenation typically leads to block-wise missing conditions. This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions, thereby limiting the model's controllability and applicability. To address this issue, we propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously. Our method maintains rigorous control over multiple conditions without requiring joint annotations. We demonstrate its effectiveness in molecular and image generation tasks, where it outperforms existing baselines both in alignment with target conditional distributions and in controllability under missing condition settings.