Domain-Adapted Small Language Models for Reliable Clinical Triage

arXiv cs.CL / 4/30/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether open-source small language models can reliably assign Emergency Severity Index (ESI) categories using privacy-preserving clinical decision support from variable free-text triage notes.
  • Across different prompting pipelines, the study finds that using clinical vignettes and concise summaries of triage narratives produces the most accurate and stable predictions.
  • The model Qwen2.5-7B shows the best trade-off among accuracy, prediction stability, and computational efficiency compared with other tested SLMs.
  • After large-scale domain adaptation with expert-curated and silver-standard pediatric triage data, fine-tuned Qwen2.5-7B models reduce both discordance and clinically significant errors, outperforming baseline SLMs and even advanced proprietary LLMs such as GPT-4o.
  • The authors conclude that institution-specific, domain-targeted fine-tuning is a practical path to dependable ESI support, and that simpler targeted tuning can outperform more complex inference strategies.

Abstract

Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliable, privacy-preserving decision-support tools for clinical triage. We systematically compared multiple SLMs across diverse prompting pipelines and found that clinical vignettes, concise summaries of triage narratives, yielded the most accurate predictions. The SLM, Qwen2.5-7B, demonstrated the strongest balance of accuracy, stability, and computational efficiency. Through large-scale domain adaptation using expert-curated and silver-standard pediatric triage data, fine-tuned Qwen2.5-7B models substantially reduced discordance and clinically significant errors, outperforming all baseline SLMs and advanced proprietary large language models (LLMs, e.g., GPT-4o). These findings highlight the feasibility of institution-specific SLMs for reliable, privacy-preserving ESI decision support and underscore the importance of targeted fine-tuning over more complex inference strategies.