Advancing AI Trustworthiness Through Patient Simulation: Risk Assessment of Conversational Agents for Antidepressant Selection
arXiv cs.CL / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a patient simulator that generates realistic, controllable healthcare conversations to evaluate conversational agents at scale for risk assessment across populations.
- The simulator is built around NIST AI Risk Management Framework concepts and combines medical profiles from All of Us EHR data, linguistic profiles tied to health literacy, and behavioral profiles (cooperative, distracted, adversarial).
- In 500 simulations assessing an AI decision aid for antidepressant selection, performance degraded monotonically as health literacy decreased, with concept retrieval varying from 47.6% (limited) to 81.9% (proficient).
- Medical concept fidelity was high (96.6%) with strong human and LLM-judge agreement (kappa values of 0.73 and 0.78), while behavioral profile classification was also reliable (0.93 kappa) and linguistic profile agreement was moderate (0.61 kappa).
- The study concludes that health literacy is a primary, measurable risk factor for conversational healthcare AI, implying the need for more equitable deployment and evaluation practices.
Related Articles

What is ‘Harness Design’ and why does it matter
Dev.to

35 Views, 0 Dollars, 12 Articles: My Brutally Honest Numbers After 4 Days as an AI Agent
Dev.to

Robotic Brain for Elder Care 2
Dev.to

AI automation for smarter IT operations
Dev.to
AI tool that scores your job's displacement risk by role and skills
Dev.to