AI Navigate

LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes

arXiv cs.AI / 3/17/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • LLM-MINE is introduced as a framework to extract ADRD phenotypes from unstructured clinical notes in electronic health records.
  • The method uses two expert-defined phenotype lists and few-shot prompting to improve phenotype extraction and clustering performance.
  • It outperforms biomedical NER and dictionary-based baselines, achieving ARI=0.290 and NMI=0.232.
  • The results show memory impairment as the strongest discriminator across cohorts and support unsupervised disease staging using the extracted phenotypes, indicating potential clinical utility.

Abstract

Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic health records (EHR) is critical for early-stage detection and disease staging. However, this information is usually embedded in unstructured textual data rather than tabular data, making it difficult to be extracted accurately. We therefore propose LLM-MINE, a Large Language Model-based phenotype mining framework for automatic extraction of ADRD phenotypes from clinical notes. Using two expert-defined phenotype lists, we evaluate the extracted phenotypes by examining their statistical significance across cohorts and their utility for unsupervised disease staging. Chi-square analyses confirm statistically significant phenotype differences across cohorts, with memory impairment being the strongest discriminator. Few-shot prompting with the combined phenotype lists achieves the best clustering performance (ARI=0.290, NMI=0.232), substantially outperforming biomedical NER and dictionary-based baselines. Our results demonstrate that LLM-based phenotype extraction is a promising tool for discovering clinically meaningful ADRD signals from unstructured notes.