TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection

arXiv cs.AI / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • TrajOnco is introduced as a training-free, multi-agent LLM framework for temporal reasoning over longitudinal EHR data to support multi-cancer early detection.
  • The chain-of-agents architecture uses long-term memory to produce patient-level summaries, evidence-linked rationales, and 1-year predicted cancer risk scores from sequential clinical events.
  • In zero-shot evaluation on de-identified Truveta EHRs across 15 cancer types, TrajOnco achieved AUROCs of 0.64–0.80 and performed competitively with supervised machine learning on a lung cancer benchmark.
  • Compared with single-agent LLMs, TrajOnco shows improved temporal reasoning, and the multi-agent approach remains effective even with smaller models like GPT-4.1-mini.
  • Human evaluation validates the fidelity of TrajOnco’s outputs, and aggregated interpretable rationales can reveal population-level risk patterns consistent with clinical knowledge.

Abstract

Accurate estimation of cancer risk from longitudinal electronic health records (EHRs) could support earlier detection and improved care, but modeling such complex patient trajectories remains challenging. We present TrajOnco, a training-free, multi-agent large language model (LLM) framework designed for scalable multi-cancer early detection. Using a chain-of-agents architecture with long-term memory, TrajOnco performs temporal reasoning over sequential clinical events to generate patient-level summaries, evidence-linked rationales, and predicted risk scores. We evaluated TrajOnco on de-identified Truveta EHR data across 15 cancer types using matched case-control cohorts, predicting risk of cancer diagnosis at 1 year. In zero-shot evaluation, TrajOnco achieved AUROCs of 0.64-0.80, performing comparably to supervised machine learning in a lung cancer benchmark while demonstrating better temporal reasoning than single-agent LLMs. The multi-agent design also enabled effective temporal reasoning with smaller-capacity models such as GPT-4.1-mini. The fidelity of TrajOnco's output was validated through human evaluation. Furthermore, TrajOnco's interpretable reasoning outputs can be aggregated to reveal population-level risk patterns that align with established clinical knowledge. These findings highlight the potential of multi-agent LLMs to execute interpretable temporal reasoning over longitudinal EHRs, advancing both scalable multi-cancer early detection and clinical insight generation.