Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a counterfactual multi-agent diagnostic framework that explicitly tests competing clinical hypotheses by editing individual findings and observing changes in diagnoses.
  • It introduces the Counterfactual Probability Gap to quantify how strongly specific findings support (or weaken) a diagnosis based on confidence shifts under counterfactual case edits.
  • The framework uses counterfactual signals to drive multi-round specialist discussions, aiming to produce more interpretable and evidence-grounded reasoning trajectories.
  • Experiments on three diagnostic benchmarks and across seven LLMs show consistent improvements in diagnostic accuracy over standard prompting and prior multi-agent baselines, especially for complex and ambiguous cases.
  • Human evaluation indicates the method yields reasoning that is more clinically useful, reliable, and coherent, positioning counterfactual evidence verification as a key step for trustworthy clinical decision support systems.

Abstract

Clinical diagnosis is a complex reasoning process in which clinicians gather evidence, form hypotheses, and test them against alternative explanations. In medical training, this reasoning is explicitly developed through counterfactual questioning--e.g., asking how a diagnosis would change if a key symptom were absent or altered--to strengthen differential diagnosis skills. As large language model (LLM)-based systems are increasingly used for diagnostic support, ensuring the interpretability of their recommendations becomes critical. However, most existing LLM-based diagnostic agents reason over fixed clinical evidence without explicitly testing how individual findings support or weaken competing diagnoses. In this work, we propose a counterfactual multi-agent diagnostic framework inspired by clinician training that makes hypothesis testing explicit and evidence-grounded. Our framework introduces counterfactual case editing to modify clinical findings and evaluate how these changes affect competing diagnoses. We further define the Counterfactual Probability Gap, a method that quantifies how strongly individual findings support a diagnosis by measuring confidence shifts under these edits. These counterfactual signals guide multi-round specialist discussions, enabling agents to challenge unsupported hypotheses, refine differential diagnoses, and produce more interpretable reasoning trajectories. Across three diagnostic benchmarks and seven LLMs, our method consistently improves diagnostic accuracy over prompting and prior multi-agent baselines, with the largest gains observed in complex and ambiguous cases. Human evaluation further indicates that our framework produces more clinically useful, reliable, and coherent reasoning. These results suggest that incorporating counterfactual evidence verification is an important step toward building reliable AI systems for clinical decision support.