Robustness Evaluation of a Foundation Segmentation Model Under Simulated Domain Shifts in Abdominal CT: Implications for Health Digital Twin Deployment
arXiv cs.CV / 4/29/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study evaluates how robust a foundation segmentation model (SAM, ViT-B) remains for spleen segmentation on abdominal CT when subjected to clinically realistic domain shifts.
- Using 1,051 slice samples from 41 CT volumes (Medical Segmentation Decathlon) and a standardized bounding-box protocol, the authors isolate encoder robustness from prompt-related uncertainty.
- Under a clean baseline, SAM achieved a mean Dice score of 0.9145 with a very low failure rate (0.67%), and across multiple simulated perturbations the mean Dice degradation stayed under 0.01.
- Statistical tests found some significant but small changes for selected perturbation conditions, while failure probability did not significantly increase according to McNemar analysis.
- The results support using SAM as a robust foundation baseline, but emphasize that formal robustness characterization is still necessary for trustworthy deployment in health digital twin scenarios.
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

Vertical SaaS for Startups 2026: Building a Niche AI-First Product
Dev.to