EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces EviCare, a framework for diagnosis prediction from EHRs that uses deep model guidance to improve LLM in-context reasoning and reduce overfitting to historically observed diagnoses.
  • Instead of directly prompting LLMs with raw EHRs, EviCare performs deep model candidate selection, evidential prioritization for set-based records, and relational evidence construction to better handle novel diagnosis prediction.
  • The framework composes these guidance signals into an adaptive in-context prompt intended to yield both higher accuracy and improved interpretability.
  • Experiments on MIMIC-III and MIMIC-IV show EviCare delivers significant gains, outperforming LLM-only and deep model-only baselines by an average of 20.65% across precision and accuracy.
  • Improvements are strongest on novel diagnosis prediction, with average gains of 30.97%, indicating the approach is especially effective for clinically important but previously underrepresented conditions.

Abstract

Recent advances in large language models (LLMs) have enabled promising progress in diagnosis prediction from electronic health records (EHRs). However, existing LLM-based approaches tend to overfit to historically observed diagnoses, often overlooking novel yet clinically important conditions that are critical for early intervention. To address this, we propose EviCare, an in-context reasoning framework that integrates deep model guidance into LLM-based diagnosis prediction. Rather than prompting LLMs directly with raw EHR inputs, EviCare performs (1) deep model inference for candidate selection, (2) evidential prioritization for set-based EHRs, and (3) relational evidence construction for novel diagnosis prediction. These signals are then composed into an adaptive in-context prompt to guide LLM reasoning in an accurate and interpretable manner. Extensive experiments on two real-world EHR benchmarks (MIMIC-III and MIMIC-IV) demonstrate that EviCare achieves significant performance gains, which consistently outperforms both LLM-only and deep model-only baselines by an average of 20.65\% across precision and accuracy metrics. The improvements are particularly notable in challenging novel diagnosis prediction, yielding average improvements of 30.97\%.