Multiscale Structure-Guided Latent Diffusion for Multimodal MRI Translation

arXiv cs.AI / 3/16/2026

📰 NewsModels & Research

共有:

Key Points

A latent diffusion-based framework called MSG-LDM is proposed for multimodal MRI translation to address anatomical inconsistencies and degraded texture when some modalities are missing.
The method introduces a style–structure disentanglement mechanism in latent space to separate modality-specific style features from shared structural representations, and models low-frequency layouts with high-frequency boundary details in a multi-scale space.
During structure disentanglement, high-frequency structural information is explicitly used to enhance feature representations, guiding the model to focus on fine-grained structural cues while learning modality-invariant, low-frequency anatomy, aided by a style-consistency loss and a structure-aware loss.
Experiments on BraTS2020 and WMH datasets show MSG-LDM outperforms existing MRI synthesis approaches in reconstructing complete structures, with the code publicly available on GitHub.

Abstract

Although diffusion models have achieved remarkable progress in multi-modal magnetic resonance imaging (MRI) translation tasks, existing methods still tend to suffer from anatomical inconsistencies or degraded texture details when handling arbitrary missing-modality scenarios. To address these issues, we propose a latent diffusion-based multi-modal MRI translation framework, termed MSG-LDM. By leveraging the available modalities, the proposed method infers complete structural information, which preserves reliable boundary details. Specifically, we introduce a style--structure disentanglement mechanism in the latent space, which explicitly separates modality-specific style features from shared structural representations, and jointly models low-frequency anatomical layouts and high-frequency boundary details in a multi-scale feature space. During the structure disentanglement stage, high-frequency structural information is explicitly incorporated to enhance feature representations, guiding the model to focus on fine-grained structural cues while learning modality-invariant low-frequency anatomical representations. Furthermore, to reduce interference from modality-specific styles and improve the stability of structure representations, we design a style consistency loss and a structure-aware loss. Extensive experiments on the BraTS2020 and WMH datasets demonstrate that the proposed method outperforms existing MRI synthesis approaches, particularly in reconstructing complete structures. The source code is publicly available at https://github.com/ziyi-start/MSG-LDM.