Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation
arXiv cs.CV / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The study benchmarks 11 promptable foundation models for bone and implant segmentation across four anatomical regions (wrist, shoulder, hip, lower leg) using non-iterative 2D and 3D prompting on private and public datasets.
- Pareto-optimal models in 2D are SAM and SAM2.1, and in 3D are nnInteractive and Med-SAM2, with performance highly dependent on the model and prompting strategy.
- Localization accuracy and rater consistency vary by anatomical structure, being higher for simple structures (e.g., wrist bones) and lower for complex structures (e.g., pelvis, tibia, implants).
- Segmentation performance drops when using human prompts compared with ideal prompts derived from reference labels, indicating that human-driven prompting can overestimate real-world performance.
- The authors provide open-source code for prompt extraction and model inference and conclude that selecting the most suitable foundation model for human-driven clinical use remains challenging due to sensitivity to prompt variations.
Related Articles
GDPR and AI Training Data: What You Need to Know Before Training on Personal Data
Dev.to
We built a 9-item checklist that catches LLM coding agent failures before execution starts
Dev.to
Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops
Dev.to
How to Build an Automated SEO Workflow with AI: Lessons Learned from Developing SEONIB
Dev.to
AI Crawler Management: The Definitive Guide to robots.txt for AI Bots
Dev.to