Development, Evaluation, and Deployment of a Multi-Agent System for Thoracic Tumor Board

arXiv cs.AI / 4/15/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The paper describes an AI-assisted workflow to generate concise patient case summaries for live use at the Stanford Thoracic Tumor Board, originally starting from a manual AI process.
  • It develops multiple automated AI chart summarization methods and evaluates them against physician-written “gold standard” summaries using fact-based scoring rubrics.
  • The study reports deployment of the final automated summarization tool and includes post-deployment monitoring to assess real-world performance over time.
  • It additionally validates using an LLM as a judge for evaluation to support fact-based scoring, comparing against rubric-driven approaches.
  • Overall, the work presents an end-to-end example of integrating multi-agent/LLM-based clinical documentation support into routine clinical practice settings.

Abstract

Tumor boards are multidisciplinary conferences dedicated to producing actionable patient care recommendations with live review of primary radiology and pathology data. Succinct patient case summaries are needed to drive efficient and accurate case discussions. We developed a manual AI-based workflow to generate patient summaries to display live at the Stanford Thoracic Tumor board. To improve on this manually intensive process, we developed several automated AI chart summarization methods and evaluated them against physician gold standard summaries and fact-based scoring rubrics. We report these comparative evaluations as well as our deployment of the final state automated AI chart summarization tool along with post-deployment monitoring. We also validate the use of an LLM as a judge evaluation strategy for fact-based scoring. This work is an example of integrating AI-based workflows into routine clinical practice.