Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation
arXiv cs.CV / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The study benchmarks 11 promptable foundation models for bone and implant segmentation across four anatomical regions (wrist, shoulder, hip, lower leg) using non-iterative 2D and 3D prompting on private and public datasets.
- Pareto-optimal models in 2D are SAM and SAM2.1, and in 3D are nnInteractive and Med-SAM2, with performance highly dependent on the model and prompting strategy.
- Localization accuracy and rater consistency vary by anatomical structure, being higher for simple structures (e.g., wrist bones) and lower for complex structures (e.g., pelvis, tibia, implants).
- Segmentation performance drops when using human prompts compared with ideal prompts derived from reference labels, indicating that human-driven prompting can overestimate real-world performance.
- The authors provide open-source code for prompt extraction and model inference and conclude that selecting the most suitable foundation model for human-driven clinical use remains challenging due to sensitivity to prompt variations.
Related Articles
Self-Refining Agents in Spec-Driven Development
Dev.to
How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)
Dev.to
Agentforce Builder: How to Build AI Agents in Salesforce
Dev.to
How AI Consulting Services Support Staff Development in Dubai
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to