GraphWalker: Graph-Guided In-Context Learning for Clinical Reasoning on Electronic Health Records

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • GraphWalker is a research proposal for improving in-context learning (ICL) with large language models on electronic health record (EHR) clinical reasoning by addressing limitations in similarity alignment, cohort-level awareness, and demonstration redundancy.
  • The method jointly models patient clinical information with LLM-estimated information gain to better align demonstration selection with the reasoning needs of the model.
  • It adds “Cohort Discovery” to incorporate population-level structure and reduce noisy local optima when selecting examples.
  • For information aggregation, GraphWalker uses a Lazy Greedy Search with Frontier Expansion to avoid diminishing marginal returns from redundant or interacting demonstrations.
  • Experiments on multiple real-world EHR benchmarks show GraphWalker outperforms existing ICL baselines, and the authors provide open-source code via the linked GitHub repository.

Abstract

Clinical Reasoning on Electronic Health Records (EHRs) is a fundamental yet challenging task in modern healthcare. While in-context learning (ICL) offers a promising inference-time adaptation paradigm for large language models (LLMs) in EHR reasoning, existing methods face three fundamental challenges: (1) Perspective Limitation, where data-driven similarity fails to align with LLM reasoning needs and model-driven signals are constrained by limited clinical competence; (2) Cohort Awareness, as demonstrations are selected independently without modeling population-level structure; and (3) Information Aggregation, where redundancy and interaction effects among demonstrations are ignored, leading to diminishing marginal gains. To address these challenges, we propose GraphWalker, a principled demonstration selection framework for EHR-oriented ICL. GraphWalker (i) jointly models patient clinical information and LLM-estimated information gain by integrating data-driven and model-driven perspectives, (ii) incorporates Cohort Discovery to avoid noisy local optima, and (iii) employs a Lazy Greedy Search with Frontier Expansion algorithm to mitigate diminishing marginal returns in information aggregation. Extensive experiments on multiple real-world EHR benchmarks demonstrate that GraphWalker consistently outperforms state-of-the-art ICL baselines, yielding substantial improvements in clinical reasoning performance. Our code is open-sourced at https://github.com/PuppyKnightUniversity/GraphWalker