Towards a Medical AI Scientist

arXiv cs.AI / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “Medical AI Scientist,” an autonomous research framework designed specifically for clinical medicine rather than generic, domain-agnostic AI scientists.
  • It grounds ideation in surveyed medical literature using a clinician-engineer co-reasoning mechanism to improve traceability of generated research ideas.
  • The framework supports evidence-grounded manuscript drafting via structured medical composition conventions and ethical policy guidance.
  • It runs in three research modes—paper-based reproduction, literature-inspired innovation, and task-driven exploration—progressing toward higher autonomy.
  • Evaluations across 171 cases, 19 clinical tasks, and 6 data modalities report substantially higher-quality ideas than commercial LLMs, higher executable-experiment success, and manuscript quality approaching MICCAI-level in double-blind expert review.

Abstract

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.