CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization

arXiv cs.CV / 4/22/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces CrossPan, a multi-institution benchmark with 1,386 3D pancreas MRI scans across three sequences (T1-weighted, T2-weighted, and Out-of-Phase) to study cross-sequence generalization systematically.
  • It finds that cross-sequence domain shifts are the dominant failure mode: models with high in-sequence Dice scores (>0.85) can collapse to near-zero performance (<0.02) when transferred to different sequences.
  • State-of-the-art domain generalization methods deliver little improvement under these physics-driven contrast inversions, while foundation models such as MedSAM2 retain moderate zero-shot performance due to contrast-invariant shape priors.
  • Semi-supervised learning helps only when intensity distributions remain stable, and it becomes unstable on sequences exhibiting high intra-organ variability.
  • Overall, the study identifies cross-sequence generalization as a primary barrier to clinically deployable pancreas MRI segmentation, more than architectural choices or center diversity.

Abstract

Automatic pancreas segmentation is fundamental to abdominal MRI analysis, yet deep learning models trained on one MRI sequence often fail catastrophically when applied to another-a challenge that has received little systematic investigation. We introduce CrossPan, a multi-institutional benchmark comprising 1,386 3D scans across three routinely acquired sequences (T1-weighted, T2-weighted, and Out-of-Phase) from eight centers. Our experiments reveal three key findings. First, cross-sequence domain shifts are far more severe than cross-center variability: models achieving Dice scores above 0.85 in-domain collapse to near-zero (<0.02) when transferred across sequences. Second, state-of-the-art domain generalization methods provide negligible benefit under these physics-driven contrast inversions, whereas foundation models like MedSAM2 maintain moderate zero-shot performance through contrast-invariant shape priors. Third, semi-supervised learning offers gains only under stable intensity distributions and becomes unstable on sequences with high intra-organ variability. These results establish cross-sequence generalization-not model architecture or center diversity-as the primary barrier to clinically deployable pancreas MRI segmentation. Dataset and code are available at https://crosspan.netlify.app/.