Disentangling Prompt Element Level Risk Factors for Hallucinations and Omissions in Mental Health LLM Responses
arXiv cs.CL / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes UTCO (User, Topic, Context, Tone), a prompt-construction framework to systematically stress-test mental-health LLM responses using controllable inquiry elements rather than static benchmark sets.
- In experiments with 2,075 UTCO-generated prompts, hallucinations were observed in 6.5% of responses and omissions in 13.2%, indicating omission errors are a substantial and safety-relevant failure mode.
- Omission failures were especially concentrated in prompts involving crisis and suicidal ideation, highlighting elevated risk in high-distress scenarios.
- Across multiple evaluation approaches (regression, element-specific matching, and similarity-matched comparisons), the most consistent predictors of failures were the prompt’s context and tone rather than user-background indicators.
- The authors argue that evaluations should treat omissions as a primary safety outcome and broaden coverage beyond underrepresented narrative, high-distress inquiries.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to