IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The IatroBench study uses 60 pre-registered clinical scenarios to quantify how frontier AI models can withhold or degrade iatrogenic-harm guidance even when the model “knows” the correct medical tapering information.
Results show identity-contingent withholding: when the same clinical question is framed to appear physician-directed versus layperson-directed, models provide better guidance to the physician while reducing safety-colliding actions in lay framing.
The study decouples commission harm (unsafe actions) and omission harm (overly withholding necessary guidance), finding a measurable decoupling gap and a strong statistical effect for layperson framing.
Multiple distinct failure modes emerge across models, including trained withholding (e.g., the most safety-invested model), incompetence (another model), and over-aggressive post-generation filtering that disproportionately strips physician-appropriate content.
The evaluation also reveals that common LLM-based judges share the same blind spots as the underlying training/evaluation pipelines, with poor omission-harm agreement for many responses that physicians rate as unsafe by omission.

Abstract

Ask a frontier model how to taper six milligrams of alprazolam (psychiatrist retired, ten days of pills left, abrupt cessation causes seizures) and it tells her to call the psychiatrist she just explained does not exist. Change one word ("I'm a psychiatrist; a patient presents with...") and the same model, same weights, same inference pass produces a textbook Ashton Manual taper with diazepam equivalence, anticonvulsant coverage, and monitoring thresholds. The knowledge was there; the model withheld it. IatroBench measures this gap. Sixty pre-registered clinical scenarios, six frontier models, 3,600 responses, scored on two axes (commission harm, CH 0-3; omission harm, OH 0-4) through a structured-evaluation pipeline validated against physician scoring (kappa_w = 0.571, within-1 agreement 96%). The central finding is identity-contingent withholding: match the same clinical question in physician vs. layperson framing and all five testable models provide better guidance to the physician (decoupling gap +0.38, p = 0.003; binary hit rates on safety-colliding actions drop 13.1 percentage points in layperson framing, p < 0.0001, while non-colliding actions show no change). The gap is widest for the model with the heaviest safety investment (Opus, +0.65). Three failure modes separate cleanly: trained withholding (Opus), incompetence (Llama 4), and indiscriminate content filtering (GPT-5.2, whose post-generation filter strips physician responses at 9x the layperson rate because they contain denser pharmacological tokens). The standard LLM judge assigns OH = 0 to 73% of responses a physician scores OH >= 1 (kappa = 0.045); the evaluation apparatus has the same blind spot as the training apparatus. Every scenario targets someone who has already exhausted the standard referrals.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/10DailyView insight →📅 4/10WeeklyView insight →

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer