AI Navigate

LGESynthNet: Controlled Scar Synthesis for Improved Scar Segmentation in Cardiac LGE-MRI Imaging

arXiv cs.AI / 3/20/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • LGESynthNet is a latent diffusion–based framework for controllable enhancement synthesis in LGE-MRI, enabling explicit control over lesion size, location, and transmural extent.
  • It uses an inpainting formulation with a ControlNet-based architecture, integrating a reward model for conditioning supervision, a captioning module for anatomically descriptive prompts, and a biomedical text encoder.
  • Trained on 429 images (79 patients), it generates realistic, anatomically coherent samples suitable for augmentation.
  • A quality control filter selects outputs with high conditioning fidelity to ensure useful augmentation data.
  • When used for training augmentation, it improves downstream segmentation and detection performance by up to 6 and 20 points respectively.

Abstract

Segmentation of enhancement in LGE cardiac MRI is critical for diagnosing various ischemic and non-ischemic cardiomyopathies. However, creating pixel-level annotations for these images is challenging and labor-intensive, leading to limited availability of annotated data. Generative models, particularly diffusion models, offer promise for synthetic data generation, yet many rely on large training datasets and often struggle with fine-grained conditioning control, especially for small or localized features. We introduce LGESynthNet, a latent diffusion-based framework for controllable enhancement synthesis, enabling explicit control over size, location, and transmural extent. Formulated as inpainting using a ControlNet-based architecture, the model integrates: (a) a reward model for conditioning-specific supervision, (b) a captioning module for anatomically descriptive text prompts, and (c) a biomedical text encoder. Trained on just 429 images (79 patients), it produces realistic, anatomically coherent samples. A quality control filter selects outputs with high conditioning-fidelity, which when used for training augmentation, improve downstream segmentation and detection performance, by up-to 6 and 20 points respectively.