Real-World Doctor Agent with Proactive Consultation through Multi-Agent Reinforcement Learning

arXiv cs.CL / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that current LLM-based clinical consultation systems often fail because single-turn prompts demand all symptoms at once and static supervised dialogue models cannot build understanding through active, multi-turn reasoning.
It introduces DoctorAgent-RL, a reinforcement learning–based multi-agent collaborative framework that trains a doctor agent (on Qwen2.5-7B-Instruct) to learn an optimal questioning strategy under uncertainty.
The approach reformulates consultations as dynamic decision-making, using strategic questions to progressively elicit key patient information across turns.
To enable realistic training, the authors created MTMedDialog, a new English multi-turn medical consultation dataset specifically designed for interactive, dynamic training.
Evaluation reportedly includes blinded human assessments and real-patient trials, with DoctorAgent-RL achieving a 70% exact diagnostic match rate and outperforming frontier models, with potential to support clinicians by handling initial screenings.

Abstract

Large language models (LLMs) struggle in real-world clinical consultations. Single-turn consultation systems require patients to describe all symptoms at once, which often leads to unclear complaints and vague diagnoses. Traditional dialogue models, constrained by static supervised learning, are limited to superficially imitating existing dialogue patterns and lack the ability to actively construct understanding in dynamic interactions, thus failing to achieve genuine clinical reasoning.To address these challenges, we propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework, and train a doctor agent on Qwen2.5-7B-Instruct using this framework. Within this framework, a medical consultation is modeled as a dynamic decision-making process under uncertainty. The core intelligence of the doctor agent is shifted from knowing the answer to learning and mastering a questioning methodology aimed at achieving an optimal diagnosis. Through strategic questioning, it guides the progressive emergence of key patient information in multi-turn dialogues. To support this high-fidelity simulation of the real diagnostic process, we constructed MTMedDialog, a novel English multi-turn medical consultation dataset designed for dynamic, interactive training.To validate its real-world effectiveness, rigorous evaluations including blinded human assessments and trials with real patients were conducted. DoctorAgent-RL outperformed frontier models and achieved a 70% exact diagnostic match rate, confirming its potential as a collaborative tool. By handling initial screenings, it can free clinicians to focus on complex cases, thereby addressing critical issues like physician shortages and misdiagnosis risks while alleviating the strain on healthcare resources.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/1DailyView insight →

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Real-World Doctor Agent with Proactive Consultation through Multi-Agent Reinforcement Learning

Key Points

Abstract

💡 Insights using this article

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer