Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark

arXiv cs.CV / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Surgical cure for pancreatic ductal adenocarcinoma depends on accurately staging vascular invasion, but computational assessment is hindered by a lack of public data and ambiguous tumor–vessel boundaries that cause high inter-rater variability.
  • The paper introduces the CURVAS-PDACVI Dataset and Challenge, an open benchmark with dense annotations and five independent expert readings per scan, aimed at uncertainty-aware AI for PDAC staging.
  • A new multi-metric evaluation framework is proposed, extending beyond spatial overlap to include probabilistic calibration and targeted vascular invasion assessment.
  • Results from six state-of-the-art methods show that strong average volumetric overlap does not reliably predict performance at clinically critical interfaces, and models optimized for binary segmentation often fail in low-consensus, high-complexity cases.
  • Approaches that explicitly model inter-rater disagreement yield better-calibrated probabilistic maps and improved robustness, underscoring the need for uncertainty-aware models for preoperative decision-making.

Abstract

Surgical resection remains the only potentially curative treatment for pancreatic ductal adenocarcinoma (PDAC), and eligibility depends on accurate assessment of vascular invasion (VI), i.e., tumor extension into adjacent critical vessels. Despite its importance for preoperative staging and surgical planning, computational VI assessment remains underexplored. Two major challenges are the lack of public datasets and the diagnostic ambiguity at the tumor-vessel interface, which leads to substantial inter-rater variability even among expert radiologists. To address these limitations, we introduce the CURVAS-PDACVI Dataset and Challenge, an open benchmark for uncertainty-aware AI in PDAC staging based on a densely annotated dataset with five independent expert annotations per scan. We also propose a multi-metric evaluation framework that extends beyond spatial overlap to include probabilistic calibration and VI assessment. Evaluation of six state-of-the-art methods shows that strong global volumetric overlap does not necessarily translate into reliable performance at clinically critical tumor-vessel interfaces. In particular, methods optimized for binary segmentation perform competitively on average overlap metrics, but often degrade in high-complexity cases with low expert consensus, either collapsing in volume or overextending at uncertain boundaries. In contrast, methods that model inter-rater disagreement produce better calibrated probabilistic maps and show greater robustness in these ambiguous cases. The benchmark highlights the limitations of volumetric accuracy as a proxy for localized surgical utility, motivating uncertainty-aware probabilistic models for preoperative decision-making.