Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “Structured Diffusion Bridges,” a framework for modality translation that addresses the under-constrained nature of cross-modal mapping.
  • Instead of assuming fully paired data as a hard requirement, it models the admissible solution space and narrows it using alignment constraints, with paired supervision treated as optional.
  • Experiments on synthetic and real modality-translation benchmarks evaluate performance across unpaired, semi-paired, and paired settings, finding consistent results regardless of supervision level.
  • The authors report that the method can achieve near fully-paired quality while substantially relaxing the need for paired data, and it remains applicable even in the unpaired regime.
  • Overall, the work positions diffusion bridges as a flexible foundation for modality translation beyond fully paired datasets.

Abstract

Modality translation is inherently under-constrained, as multiple cross-modal mappings may yield the same marginals. Recent work has shown that diffusion bridges are effective for this task. However, most existing approaches rely on fully paired datasets, thereby imposing a single data-driven constraint. We propose a diffusion-bridge framework that characterizes the space of admissible solutions and restricts it via alignment constraints, treating paired supervision as an optional heuristic rather than a prerequisite. We validate our method on synthetic and real modality translation benchmarks across unpaired, semi-paired, and paired regimes, showing consistent performance across supervision levels. Notably, \textbf{it achieves near fully-paired quality with a substantial relaxation in pairing requirements, and remaining applicable in the unpaired regime}. These results highlight diffusion bridges as a flexible foundation for modality translation beyond fully paired data.