VolDiT: Controllable Volumetric Medical Image Synthesis with Diffusion Transformers
arXiv cs.CV / 3/27/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- VolDiT proposes a first purely transformer-based 3D diffusion transformer for volumetric medical image synthesis, moving beyond common latent diffusion approaches that use convolutional U-Net backbones.
- The method extends diffusion transformers to native 3D data using volumetric patch embeddings and global self-attention over 3D tokens to better capture global context.
- For structured guidance, VolDiT introduces a timestep-gated control adapter that converts segmentation masks into learnable control tokens, modulating transformer layers during denoising.
- Experiments on high-resolution 3D medical image synthesis tasks report improved global coherence, higher generative fidelity, and stronger controllability compared with state-of-the-art 3D latent diffusion models based on U-Nets.
- The authors make code and trained models available via the provided GitHub repository, supporting reproducibility and further research.
Related Articles
I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial
Dev.to
The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage
Dev.to
AI 自主演化的時代來臨:從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage
Dev.to
Most Dev.to Accounts Are Run by Humans. This One Isn't.
Dev.to
Neural Networks in Mobile Robot Motion
Dev.to