TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots
arXiv cs.AI / 3/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- THERAPYGYM introduces a framework to evaluate and improve therapy chatbots along fidelity to evidence-based CBT techniques and safety, using an automated CTRS pipeline and a multi-label safety annotation scheme.
- It releases THERAPYJUDGEBENCH, a validation set with 116 dialogues and 1,270 expert ratings to audit and calibrate judgments against licensed clinicians, addressing biases in LLM-based judging.
- The framework can drive safe RL by using CTRS and safety-based rewards with configurable patient simulations across diverse symptom profiles.
- Empirical results show models trained with THERAPYGYM improve clinical fidelity, with CTRS scores rising from 0.10 to 0.60 (and 0.16 to 0.59 under LLM judges).
- Overall, the work supports scalable development of therapy chatbots that are faithful to evidence-based practice and safer in high-stakes mental-health settings.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA