One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction

arXiv cs.AI / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM-based clinical prediction suffers from case-level heterogeneity, where complex cases produce divergent outputs under small prompt changes.
  • It introduces CAMP (Case-Adaptive Multi-agent Panel), in which an attending-physician agent dynamically assembles a specialist panel based on each case’s diagnostic uncertainty.
  • Specialists use three-valued voting (KEEP/REFUSE/NEUTRAL) to support principled abstention when cases fall outside their expertise.
  • A hybrid routing mechanism selects between strong consensus, attending-physician fallback, or evidence-based arbitration that weighs argument quality rather than just vote counts.
  • Experiments on MIMIC-IV for diagnostic prediction and brief hospital-course generation across four LLM backbones show CAMP outperforms strong baselines while using fewer tokens, with voting/arbitration traces enabling decision audits.

Abstract

Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from one role-conditioned distribution, and multi-agent frameworks use fixed roles with flat majority voting, discarding the diagnostic signal in disagreement. We propose CAMP (Case-Adaptive Multi-agent Panel), where an attending-physician agent dynamically assembles a specialist panel tailored to each case's diagnostic uncertainty. Each specialist evaluates candidates via three-valued voting (KEEP/REFUSE/NEUTRAL), enabling principled abstention outside one's expertise. A hybrid router directs each diagnosis through strong consensus, fallback to the attending physician's judgment, or evidence-based arbitration that weighs argument quality over vote counts. On diagnostic prediction and brief hospital course generation from MIMIC-IV across four LLM backbones, CAMP consistently outperforms strong baselines while consuming fewer tokens than most competing multi-agent methods, with voting records and arbitration traces offering transparent decision audits.