CLIMB: Controllable Longitudinal Brain Image Generation using Mamba-based Latent Diffusion Model and Gaussian-aligned Autoencoder

arXiv cs.AI / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • CLIMB は、ベースラインMRIと取得年齢を入力にして、脳の経時的な構造変化を生成・予測する「状態空間ベースの潜在拡散モデル(LDM)」として提案されています。
  • 年齢の投影、性別、疾患状態、遺伝情報、脳構造ボリュームなど複数の条件変数を用いることで、解剖学的変化の時間モデリングを強化します。
  • 従来の自己注意(self-attention)に依存するLDMと比べ、状態空間モデルを採用することで計算コストを大幅に抑えつつ、高品質な画像生成を維持することを目指しています。
  • さらに、ガウス整合オートエンコーダを導入し、従来の変分オートエンコーダにあるサンプリング由来のノイズを抑えた潜在表現の抽出を行います。
  • Alzheimer Disease Neuroimaging Initiative(6,306スキャン、1,390人)で評価し、実MRIとの比較で構造的類似性指標(SSIM)0.9433を達成したと報告しています。

Abstract

Latent diffusion models have emerged as powerful generative models in medical imaging, enabling the synthesis of high quality brain magnetic resonance imaging scans. In particular, predicting the evolution of a patients brain can aid in early intervention, prognosis, and treatment planning. In this study, we introduce CLIMB, Controllable Longitudinal brain Image generation via state space based latent diffusion model, an advanced framework for modeling temporal changes in brain structure. CLIMB is designed to model the structural evolution of the brain structure over time, utilizing a baseline MRI scan and its acquisition age as foundational inputs. Additionally, multiple conditional variables, including projected age, gender, disease status, genetic information, and brain structure volumes, are incorporated to enhance the temporal modeling of anatomical changes. Unlike existing LDM methods that rely on self attention modules, which effectively capture contextual information from input images but are computationally expensive, our approach leverages state space, a state space model architecture that substantially reduces computational overhead while preserving high-quality image synthesis. Furthermore, we introduce a Gaussian-aligned autoencoder that extracts latent representations conforming to prior distributions without the sampling noise inherent in conventional variational autoencoders. We train and evaluate our proposed model on the Alzheimers Disease Neuroimaging Initiative dataset, consisting of 6,306 MRI scans from 1,390 participants. By comparing generated images with real MRI scans, CLIMB achieves a structural similarity index of 0.9433, demonstrating notable improvements over existing methods.