Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling

arXiv cs.CL / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a textual time-series corpus built from 136 PubMed Open Access single-patient GLP-1RA case reports to make narrative timelines more usable for longitudinal risk modeling.
  • It evaluates large language model–based automated timeline extraction against expert-annotated gold-standard timelines, focusing on both event recovery and precise temporal ordering.
  • The best-performing system shows strong performance in recovering clinical events and their temporal sequencing across symptoms, diagnoses, treatments, lab tests, and outcomes.
  • As a downstream example, time-to-event analysis suggests GLP-1 users have a lower risk of respiratory sequelae than non-users (HR 0.259, p<0.05), aligning with earlier findings.
  • The authors plan to release the temporal annotations and code after acceptance, enabling reuse of the dataset and methods.

Abstract

Type 2 diabetes case reports describe complex clinical courses, but their timelines are often expressed in language that is difficult to reuse in longitudinal modeling. To address this gap, we developed a textual time-series corpus of 136 PubMed Open Access single-patient case reports involving glucagon-like peptide 1 receptor agonists, with clinical events associated with their most probable reference times. We evaluated automated LLM timeline extraction against gold-standard timelines annotated by clinical domain experts, assessing how well systems recovered clinical events and their timings. The best-performing LLM produced high event coverage (GPT5; 0.871) and reliable temporal sequencing across symptoms (GPT5; 0.843), diagnoses, treatments, laboratory tests, and outcomes. As a downstream demonstration, time-to-event analyses in diabetes suggested lower risk of respiratory sequelae among GLP-1 users versus non-users (HR=0.259, p<0.05), consistent with prior reports of improved respiratory outcomes. Temporal annotations and code will be released upon acceptance.