MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation

arXiv cs.AI / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MARCH, a multi-agent framework for automated 3D CT radiology report generation aimed at reducing clinical hallucinations.
  • MARCH assigns specialized roles to agents—Resident for initial drafting with multi-scale CT feature extraction, Fellow agents for retrieval-augmented revisions, and an Attending agent that runs iterative stance-based consensus to resolve diagnostic disagreements.
  • By emulating the professional hierarchy and iterative verification of radiology workflows, MARCH addresses limitations of existing vision-language model approaches that behave like monolithic “black boxes.”
  • Experiments on the RadGenome-ChestCT dataset show MARCH outperforms state-of-the-art baselines in both clinical fidelity and language (linguistic) accuracy.
  • The authors argue that modeling human organizational structures can improve the reliability of AI systems in high-stakes medical settings.

Abstract

Automated 3D radiology report generation often suffers from clinical hallucinations and a lack of the iterative verification found in human practice. While recent Vision-Language Models (VLMs) have advanced the field, they typically operate as monolithic "black-box" systems without the collaborative oversight characteristic of clinical workflows. To address these challenges, we propose MARCH (Multi-Agent Radiology Clinical Hierarchy), a multi-agent framework that emulates the professional hierarchy of radiology departments and assigns specialized roles to distinct agents. MARCH utilizes a Resident Agent for initial drafting with multi-scale CT feature extraction, multiple Fellow Agents for retrieval-augmented revision, and an Attending Agent that orchestrates an iterative, stance-based consensus discourse to resolve diagnostic discrepancies. On the RadGenome-ChestCT dataset, MARCH significantly outperforms state-of-the-art baselines in both clinical fidelity and linguistic accuracy. Our work demonstrates that modeling human-like organizational structures enhances the reliability of AI in high-stakes medical domains.