Unifying Deep Stochastic Processes for Image Enhancement

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper proposes a unified framework for stochastic image enhancement by grouping prior work into three continuous-time process families: unconditional diffusion models, Ornstein–Uhlenbeck (OU) processes, and diffusion bridges.
  • It shows that these seemingly different methods can all be derived from a common stochastic differential equation (SDE) formulation, differing mainly in drift/diffusion terms, terminal distributions, and boundary conditions.
  • The framework separates “orthogonal” design choices, clarifying how schedulers and samplers can be treated independently from the core stochastic process formulation.
  • Through controlled experiments across multiple image enhancement tasks using identical architectures and training protocols, the authors find no single method consistently dominates, and instead pinpoint which specific design choices most affect performance.
  • The authors release ItoVision, a modular PyTorch library implementing the unified framework to speed up prototyping and enable more fair comparisons.

Abstract

Deep stochastic processes have recently become a central paradigm for image enhancement, with many methods explicitly conditioning the stochastic trajectory on the degraded input. However, the relationship between these conditional processes and standard diffusion models remains unclear. In this work, we introduce a unified perspective on stochastic image enhancement by classifying recent methods into three families of continuous-time processes: unconditional diffusion models, Ornstein-Uhlenbeck (OU) processes, and diffusion bridges. We show that all of these approaches arise from a common stochastic differential equation (SDE) formulation. This framework makes explicit that seemingly disparate methods differ primarily in their drift and diffusion terms, terminal distributions, and boundary conditions, while schedulers and samplers constitute orthogonal design choices. Leveraging this unification, we conduct a controlled empirical study across multiple image enhancement tasks using identical architectures and training protocols. Our results reveal no consistently dominant method; instead, we identify and disentangle the specific design choices that most strongly influence performance. Finally, we release ItoVision, a modular PyTorch library that implements the unified framework and enables rapid prototyping and fair comparison of stochastic image enhancement methods.