OMIND: Framework for Knowledge Grounded Finetuning and Multi-Turn Dialogue Benchmark for Mental Health LLMs

arXiv cs.CL / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies key barriers to adapting LLMs for mental health use, including the scarcity of high-quality interpretable, knowledge-grounded training data, limited training paradigms, and weak evaluation for multi-turn dialogue settings.
  • It proposes the oMind framework for knowledge-grounded fine-tuning and alignment of mental-health-focused LLM agents, targeting diverse conversational capabilities.
  • The authors introduce a sizable (~164k) multi-task SFT dataset built via a generation pipeline using structured knowledge retrieval, LLM-based pruning, and human review actions to improve quality.
  • They also publish oMind-Chat, a new multi-turn benchmark with expert annotations at both turn and conversation levels to support more realistic evaluation.
  • Experiments report that oMind-tuned models outperform baselines on core capabilities and conversation tasks, with oMind-LLM showing improved reasoning performance (up to an 80% win rate).

Abstract

Large Language Models (LLMs) have shown remarkable capabilities for complex tasks, yet adaptation in medical domain, specifically mental health, poses specific challenges. Mental health is a rising concern globally with LLMs having large potential to help address the same. We highlight three primary challenges for LLMs in mental health - lack of high quality interpretable and knowledge grounded training data; training paradigms restricted to core capabilities, and evaluation of multi turn dialogue settings. Addressing it, we present oMind framework which includes training and aligning LLM agents for diverse capabilities including conversations; high quality ~164k multi-task SFT dataset, as a result of our generation pipeline based on Structured Knowledge retrieval, LLM based pruning, and review actions. We also introduce oMind-Chat - a novel multi turn benchmark dataset with expert annotated turn level and conversation level rubrics. Our diverse experiments on both core capabilities and conversations shows oMind LLMs consistently outperform baselines. oMind-LLM also shows significantly better reasoning with up to 80% win rate.