Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Romanized Scripts in a Real World Setting
arXiv cs.CL / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates how romanized (Latin-script) versus native-script Indian-language inputs affect the reliability of leading LLMs in maternal and newborn healthcare triage.
- Benchmarks on a real, user-generated health-query dataset across five Indian languages and Nepali find consistent performance degradation for romanized messages, with gaps up to 24 points across languages and models.
- The authors propose an uncertainty-based selective routing approach to mitigate the “script gap,” improving handling of low-confidence romanized queries.
- The study estimates that the observed degradation could translate into nearly 2 million excess triage errors at their partner maternal health organization alone, underscoring safety risks.
- Overall, the findings reveal a safety blind spot where LLMs may seem to understand romanized text but still fail to triage reliably in high-stakes clinical settings.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm
Dev.to
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA