Evaluating Patient Safety Risks in Generative AI: Development and Validation of a FMECA Framework for Generated Clinical Content
arXiv cs.AI / 5/7/2026
💬 OpinionModels & Research
Key Points
- The paper addresses a gap in structured patient-safety risk assessment methods for LLM-generated clinical text, proposing an FMECA-based approach tailored to generated summaries.
- An interdisciplinary panel developed a taxonomy of 14 failure modes and adapted standard FMECA dimensions (occurrence, severity, detectability) into 5-point ordinal scales for scoring risk.
- The framework was validated by having reviewers annotate 36 generated discharge summaries (from four patients) produced by an open LLM (GPT-OSS 120B) using real clinical data from Geneva University Hospitals.
- Results show improved inter-rater reliability across annotation rounds, with moderate-to-substantial agreement for failure mode identification and good agreement for severity and detectability scoring.
- Usability and content validity were supported by an adapted System Usability Scale, yielding a mean SUS score of 79.2/100 and high evaluator confidence.
Related Articles

Why GPU Density Just Broke Two Decades of Data Centre Design Assumptions
Dev.to

From Demos to Guardrails: 10 Reddit Threads Tracking the AI-Agent Shift
Dev.to

What Reddit’s Agent Builders Were Actually Debugging This Week
Dev.to

Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets
MarkTechPost
Qwen 3.6?
Reddit r/LocalLLaMA