Developing and evaluating a chatbot to support maternal health care
arXiv cs.AI / 3/16/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces a chatbot for maternal health in India that combines stage-aware triage, hybrid retrieval over guidelines, and evidence-conditioned generation from an LLM to handle short, code-mixed multilingual queries.
- It provides an evaluation workflow for high-stakes deployment, including a labeled triage benchmark (N=150) with emergency recall metrics, a synthetic multi-evidence retrieval benchmark (N=100) with evidence labels, an LLM-as-judge comparison on real queries (N=781), and expert validation.
- Findings indicate that trustworthy medical assistants in multilingual, noisy settings require defense-in-depth design and multi-method evaluation rather than reliance on a single model or metric.
- The work reflects a multi-stakeholder collaboration among academia, a health tech company, a public health nonprofit, and a hospital, highlighting real-world deployment considerations in low-resource settings.



