From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight LLM

arXiv cs.CL / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Med-Shicheng, a framework for using lightweight LLMs to preserve, standardize, and scale physicians’ diagnostic-and-therapeutic expertise, including case-dependent adaptation rules.
  • Med-Shicheng is implemented in five stages and targets knowledge from five distinguished Chinese Medicine physicians, training a single model across seven clinical TCM tasks (from etiology-pathogenesis to prescription generation and clinical advice).
  • Experiments using Qwen2.5-1.5B-Base indicate the approach can run on resource-constrained GPUs while achieving performance comparable to stronger models such as DeepSeek-R1 and GPT-5.
  • The authors evaluate reliability of LLMs as judges, finding that automated judging captures overall trends but can be biased on fine-grained individualized distinctions, implying continued physician involvement when ground truth is limited.
  • The work frames the core challenge as knowledge systems being slow to develop and difficult to transmit at scale, and positions standardized LLM training as a pathway to address expertise scarcity in clinical settings.

Abstract

Medicine is an empirical discipline refined through long-term observation and the messy, high-variance reality of clinical practice. Physicians build diagnostic and therapeutic competence through repeated cycles of application, reflection, and improvement, forming individualized methodologies. Yet outcomes vary widely, and master physicians' knowledge systems are slow to develop and hard to transmit at scale, contributing to the scarcity of high-quality clinical expertise. To address this, we propose Med-Shicheng, a general framework that enables large language models to systematically learn and transfer distinguished physicians' diagnostic-and-therapeutic philosophy and case-dependent adaptation rules in a standardized way. Built on Tianyi, Med-Shicheng consists of five stages. We target five National Masters of Chinese Medicine or distinguished TCM physicians, curate multi-source materials, and train a single model to internalize all five knowledge systems across seven tasks, including etiology-pathogenesis analysis, syndrome diagnosis, treatment principle selection, prescription generation, prescription explanation, symptom evolution with regimen adjustment, and clinical advice. Implemented on Qwen2.5-1.5B-Base, Med-Shicheng runs on resource-constrained GPUs while achieving performance comparable to DeepSeek-R1 and GPT-5. We also examine the reliability of LLM-as-a-judge versus physician evaluation: automated judging tracks overall trends but shows bias on fine-grained individualized distinctions, highlighting the need for physician involvement when ground truth is unavailable and for domain-adapted judge models.