Real-World Doctor Agent with Proactive Consultation through Multi-Agent Reinforcement Learning

arXiv cs.CL / 5/1/2026

📰 NewsModels & Research

Key Points

  • The paper argues that current LLM-based clinical consultation systems often fail because single-turn prompts demand all symptoms at once and static supervised dialogue models cannot build understanding through active, multi-turn reasoning.
  • It introduces DoctorAgent-RL, a reinforcement learning–based multi-agent collaborative framework that trains a doctor agent (on Qwen2.5-7B-Instruct) to learn an optimal questioning strategy under uncertainty.
  • The approach reformulates consultations as dynamic decision-making, using strategic questions to progressively elicit key patient information across turns.
  • To enable realistic training, the authors created MTMedDialog, a new English multi-turn medical consultation dataset specifically designed for interactive, dynamic training.
  • Evaluation reportedly includes blinded human assessments and real-patient trials, with DoctorAgent-RL achieving a 70% exact diagnostic match rate and outperforming frontier models, with potential to support clinicians by handling initial screenings.

Abstract

Large language models (LLMs) struggle in real-world clinical consultations. Single-turn consultation systems require patients to describe all symptoms at once, which often leads to unclear complaints and vague diagnoses. Traditional dialogue models, constrained by static supervised learning, are limited to superficially imitating existing dialogue patterns and lack the ability to actively construct understanding in dynamic interactions, thus failing to achieve genuine clinical reasoning.To address these challenges, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework, and train a doctor agent on Qwen2.5-7B-Instruct using this framework. Within this framework, a medical consultation is modeled as a dynamic decision-making process under uncertainty. The core intelligence of the doctor agent is shifted from knowing the answer to learning and mastering a questioning methodology aimed at achieving an optimal diagnosis. Through strategic questioning, it guides the progressive emergence of key patient information in multi-turn dialogues. To support this high-fidelity simulation of the real diagnostic process, we constructed MTMedDialog, a novel English multi-turn medical consultation dataset designed for dynamic, interactive training.To validate its real-world effectiveness, rigorous evaluations including blinded human assessments and trials with real patients were conducted. DoctorAgent-RL outperformed frontier models and achieved a 70% exact diagnostic match rate, confirming its potential as a collaborative tool. By handling initial screenings, it can free clinicians to focus on complex cases, thereby addressing critical issues like physician shortages and misdiagnosis risks while alleviating the strain on healthcare resources.