ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 提案論文は、アルツハイマー病の追跡MRIを「フォローアップ時点(間隔)」と「参加者の臨床・人口統計・神経心理情報」をテキスト条件として制御する拡散トランスフォーマーモデルADP-DiTを提示した。
  • OpenCLIPとT5の2種類のテキストエンコーダを用いて自然言語プロンプトの埋め込みを生成し、DiT側ではクロスアテンションとアダプティブ層正規化できめ細かな誘導と全体変調を行う。
  • 画像側では回転位置埋め込みと、SDXL-VAEの事前学習済み潜在空間での拡散により、高解像度再構成と解剖学的忠実性を高める工夫がされている。
  • 3T T1画像(712名・3,321スキャン)でSSIM 0.8739、PSNR 29.32dBを達成し、DiTベースラインよりSSIMは+0.1087、PSNRは+6.08dB改善し、脳室拡大や海馬縮小など進行に関連する変化も捉えられることが示された。

Abstract

Alzheimer's disease (AD) progresses heterogeneously across individuals, motivating subject-specific synthesis of follow-up magnetic resonance imaging (MRI) to support progression assessment. While Diffusion Transformers (DiT), an emerging transformer-based diffusion model, offer a scalable backbone for image synthesis, longitudinal AD MRI generation with clinically interpretable control over follow-up time and participant metadata remains underexplored. We present ADP-DiT, an interval-aware, clinically text-conditioned diffusion transformer for longitudinal AD MRI synthesis. ADP-DiT encodes follow-up interval together with multi-domain demographic, diagnostic (CN/MCI/AD), and neuropsychological information as a natural-language prompt, enabling time-specific control beyond coarse diagnostic stages. To inject this conditioning effectively, we use dual text encoders-OpenCLIP for vision-language alignment and T5 for richer clinical-language understanding. Their embeddings are fused into DiT through cross-attention for fine-grained guidance and adaptive layer normalization for global modulation. We further enhance anatomical fidelity by applying rotary positional embeddings to image tokens and performing diffusion in a pre-trained SDXL-VAE latent space to enable efficient high-resolution reconstruction. On 3,321 longitudinal 3T T1-weighted scans from 712 participants (259,038 image slices), ADP-DiT achieves SSIM 0.8739 and PSNR 29.32 dB, improving over a DiT baseline by +0.1087 SSIM and +6.08 dB PSNR while capturing progression-related changes such as ventricular enlargement and shrinking hippocampus. These results suggest that integrating comprehensive, subject-specific clinical conditions with architectures can improve longitudinal AD MRI synthesis.

ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression | AI Navigate