PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation

arXiv cs.CL / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes PLACID, a privacy-preserving approach to clinical acronym inference and disambiguation that runs entirely on-device to avoid sending Protected Health Information to cloud LLMs.
  • It uses a cascaded pipeline: local general-purpose models detect clinical acronyms and then route them to domain-specific biomedical models to generate context-relevant expansions.
  • The authors find that general instruction-following models can achieve strong acronym detection accuracy (~0.988) but suffer notably in expansion quality (~0.655), creating a gap for safe clinical use.
  • By switching to domain-specific biomedical models for expansion, the cascaded method improves expansion accuracy to about ~0.81 while meeting on-device constraints using small models in the ~2B–10B parameter range.
  • The work frames acronym disambiguation as a high-stakes healthcare task where privacy-preserving deployment can reduce the risk of life-threatening medication errors caused by abbreviation misinterpretation.

Abstract

Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like life-threatening medication errors. While cloud-dependent LLMs excel at Acronym Disambiguation, transmitting Protected Health Information to external servers violates privacy frameworks. To bridge this gap, this study pioneers the evaluation of small-parameter models deployed entirely on-device to ensure privacy preservation. We introduce a privacy-preserving cascaded pipeline leveraging general-purpose local models to detect clinical acronyms, routing them to domain-specific biomedical models for context-relevant expansions. Results reveal that while general instruction-following models achieve high detection accuracy (~0.988), their expansion capabilities plummet (~0.655). Our cascaded approach utilizes domain-specific medical models to increase expansion accuracy to (~0.81). This novel work demonstrates that privacy-preserving, on-device (2B-10B) models deliver high-fidelity clinical acronym disambiguation support.