Conditional Diffusion Posterior Alignment for Sparse-View CT Reconstruction

arXiv cs.LG / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses sparse-view CT reconstruction challenges in scaling diffusion-based methods to large 3D volumes, including memory/computation limits, limited 3D training data, and inter-slice inconsistencies from slice-wise 2D processing.
  • It introduces Conditional Diffusion Posterior Alignment (CDPA), which uses a 2D U-Net diffusion model conditioned on an initial 3D reconstruction to improve consistency across slices while explicitly enforcing data consistency with measured projections.
  • The authors report state-of-the-art results on both synthetic and real Cone Beam CT (CBCT) datasets, with ablation studies supporting that conditioning and data-consistency alignment work synergistically.
  • They further show the approach can enhance fast denoising U-Nets, achieving near diffusion-model reconstruction quality at a much lower computational cost.

Abstract

Computed Tomography (CT) is a widely used imaging modality in medical and industrial applications. To limit radiation exposure and measurement time, there is a growing interest in sparse-view CT, where the number of projection views is significantly reduced. Deep neural networks have shown great promise in improving reconstruction quality in sparse-view CT, especially generative diffusion models. However, these methods struggle to scale to large 3D volumes due to several reasons: (i) the high memory and computational requirements of 3D models, (ii) the lack of large 3D training datasets, and (iii) the inconsistencies across slices when using 2D models independently on each slice. We overcome these limitations and scale diffusion-based sparse-view CT reconstruction to large 3D volumes by combining conditional diffusion with explicit data consistency. We propose Conditional Diffusion Posterior Alignment (CDPA) to enable scalable 3D sparse-view CT reconstruction. A 2D U-Net diffusion model is conditioned on an initial 3D reconstruction to improve inter-slice consistency, combined with data-consistency alignment to match measured projections. Experiments on synthetic and real Cone Beam CT (CBCT) data show state-of-the-art performance, with ablations that confirm the synergistic effects of the proposed pipeline. Finally, we show that the same principles also strengthen fast denoising U-Nets, yielding near-diffusion quality at a fraction of the computational cost.