TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TwinTrack is introduced as a post-hoc multi-rater calibration framework for medical image segmentation, targeting the ambiguity caused by inter-expert disagreement.
It calibrates ensemble segmentation probabilities to the empirical mean human response (MHR), defined as the fraction of expert annotators labeling each voxel as tumor.
The resulting calibrated probabilities are directly interpretable as the expected proportion of annotators who would assign the tumor label, explicitly reflecting uncertainty.
The calibration method is described as simple and requiring only a small multi-rater calibration dataset.
Evaluations on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark show consistent improvements in calibration metrics versus standard approaches.

Abstract

Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer