IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
arXiv cs.CL / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The IatroBench study uses 60 pre-registered clinical scenarios to quantify how frontier AI models can withhold or degrade iatrogenic-harm guidance even when the model “knows” the correct medical tapering information.
- Results show identity-contingent withholding: when the same clinical question is framed to appear physician-directed versus layperson-directed, models provide better guidance to the physician while reducing safety-colliding actions in lay framing.
- The study decouples commission harm (unsafe actions) and omission harm (overly withholding necessary guidance), finding a measurable decoupling gap and a strong statistical effect for layperson framing.
- Multiple distinct failure modes emerge across models, including trained withholding (e.g., the most safety-invested model), incompetence (another model), and over-aggressive post-generation filtering that disproportionately strips physician-appropriate content.
- The evaluation also reveals that common LLM-based judges share the same blind spots as the underlying training/evaluation pipelines, with poor omission-harm agreement for many responses that physicians rate as unsafe by omission.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
My Bestie Built a Free MCP Server for Job Search — Here's How It Works
Dev.to
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial