Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
arXiv cs.CL / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper finds that common LLM-as-a-judge approaches perform poorly on mental health counseling data, reaching only about 52% accuracy and sometimes near-zero recall for hallucination detection.
- It attributes the weakness to LLM judges’ inability to capture the nuanced linguistic and therapeutic patterns that human domain experts rely on for safety-critical evaluation.
- The authors propose a human+LLM framework that extracts interpretable, domain-informed features across five dimensions: logical consistency, entity verification, factual accuracy, linguistic uncertainty, and professional appropriateness.
- Experiments using both a public mental health dataset and a new human-annotated dataset show that traditional ML models trained on these features achieve stronger hallucination detection (0.717 F1 on the custom set; 0.849 F1 on a benchmark) but more modest omission detection performance (0.59–0.64 F1).
- Overall, the work argues that combining domain expertise with structured automated evaluation is more reliable and transparent than relying on black-box LLM judging for high-stakes mental health chatbot use.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business
Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial
Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to