AI Navigate

UltrasoundAgents: Hierarchical Multi-Agent Evidence-Chain Reasoning for Breast Ultrasound Diagnosis

arXiv cs.CV / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes UltrasoundAgents, a hierarchical multi-agent framework for breast ultrasound diagnosis that improves evidence traceability and aligns with clinical workflows.
  • A main agent localizes the lesion and triggers a crop-and-zoom operation, while a sub-agent analyzes the local view and predicts four clinically relevant attributes: echogenicity pattern, calcification, boundary type, and edge (margin) morphology.
  • The main agent combines these attributes to output the BI-RADS category and malignancy prediction, along with intermediate evidence that can be reviewed.
  • To mitigate training challenges such as error propagation, the authors propose a decoupled progressive training strategy that first trains the attribute agent, then the main agent with oracle attributes, followed by trajectory self-distillation with spatial supervision.
  • Experiments report improvements over strong vision-language baselines in diagnostic accuracy and attribute agreement, demonstrating more structured evidence and traceable reasoning.

Abstract

Breast ultrasound diagnosis typically proceeds from global lesion localization to local sign assessment and then evidence integration to assign a BI-RADS category and determine benignity or malignancy. Many existing methods rely on end-to-end prediction or provide only weakly grounded evidence, which can miss fine-grained lesion cues and limit auditability and clinical review. To align with the clinical workflow and improve evidence traceability, we propose a hierarchical multi-agent framework, termed UltrasoundAgents. A main agent localizes the lesion in the full image and triggers a crop-and-zoom operation. A sub-agent analyzes the local view and predicts four clinically relevant attributes, namely echogenicity pattern, calcification, boundary type, and edge (margin) morphology. The main agent then integrates these structured attributes to perform evidence-based reasoning and output the BI-RADS category and the malignancy prediction, while producing reviewable intermediate evidence. Furthermore, hierarchical multi-agent training often suffers from error propagation, difficult credit assignment, and sparse rewards. To alleviate this and improve training stability, we introduce a decoupled progressive training strategy. We first train the attribute agent, then train the main agent with oracle attributes to learn robust attribute-based reasoning, and finally apply corrective trajectory self-distillation with spatial supervision to build high-quality trajectories for supervised fine-tuning, yielding a deployable end-to-end policy. Experiments show consistent gains over strong vision-language baselines in diagnostic accuracy and attribute agreement, together with structured evidence and traceable reasoning.