Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of extracting clinically relevant information from unstructured medical texts by focusing on fine-grained Medical Entity Recognition (MER) rather than coarse entity types.
  • It evaluates an open-source LLaMA3 model across 18 detailed clinical entity categories and compares three learning approaches: zero-shot, few-shot, and LoRA-based fine-tuning.
  • To improve few-shot performance, the authors use BioBERT-derived token- and sentence-level embedding similarity to select the most relevant examples.
  • Methodological consistency is emphasized by applying all paradigms to the same LLaMA3 backbone, enabling a fair comparison across learning settings.
  • Results show that fine-tuned LLaMA3 significantly outperforms zero-shot and few-shot setups, reaching an F1 score of 81.24% for granular medical entity extraction.

Abstract

Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent advancements in large language models (LLMs) have shown competitive MER performance; however, evaluations often focus on general entity types, offering limited utility for real-world clinical needs requiring finer-grained extraction. To address this gap, we rigorously evaluated the open-source LLaMA3 model for fine-grained medical entity recognition across 18 clinically detailed categories. To optimize performance, we employed three learning paradigms: zero-shot, few-shot, and fine-tuning with Low-Rank Adaptation (LoRA). To further enhance few-shot learning, we introduced two example selection methods based on token- and sentence-level embedding similarity, utilizing a pre-trained BioBERT model. Unlike prior work assessing zero-shot and few-shot performance on proprietary models (e.g., GPT-4) or fine-tuning different architectures, we ensured methodological consistency by applying all strategies to a unified LLaMA3 backbone, enabling fair comparison across learning settings. Our results showed that fine-tuned LLaMA3 surpasses zero-shot and few-shot approaches by 63.11% and 35.63%, respectivel respectively, achieving an F1 score of 81.24% in granular medical entity extraction.