Unified Ultrasound Intelligence Toward an End-to-End Agentic System

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes USTri, a tri-stage pipeline aimed at unified and generalizable clinical ultrasound intelligence across multiple organs, views, devices, and tasks.
  • Stage I trains a universal generalist model (USGen) to learn transferable ultrasound priors that are robust to device and protocol variability.
  • Stage II improves task alignment under domain shifts by freezing USGen and fine-tuning dataset-specific heads (USpec) to reduce instability from cross-task interference.
  • Stage III introduces USAgent, which orchestrates specialists to perform multi-step inference and generate deterministic, structured (clinician-like) reports.
  • On the FMC_UIA validation set, USTri achieves the best overall performance across 4 task types and 27 datasets, and qualitative results indicate high-accuracy, interpretable structured reports.

Abstract

Clinical ultrasound analysis demands models that generalize across heterogeneous organs, views, and devices, while supporting interpretable workflow-level analysis. Existing methods often rely on task-wise adaptation, and joint learning may be unstable due to cross-task interference, making it hard to deliver workflow-level outputs in practice. To address these challenges, we present USTri, a tri-stage ultrasound intelligence pipeline for unified multi-organ, multi-task analysis. Stage I trains a universal generalist USGen on different domains to learn broad, transferable priors that are robust to device and protocol variability. To better handle domain shifts and reach task-aligned performance while preserving ultrasound shared knowledge, Stage II builds USpec by keeping USGen frozen and finetuning dataset-specific heads. Stage III introduces USAgent, which mimics clinician workflows by orchestrating USpec specialists for multi-step inference and deterministic structured reports. On the FMC\_UIA validation set, our model achieves the best overall performance across 4 task types and 27 datasets, outperforming state-of-the-art methods. Moreover, qualitative results show that USAgent produces clinically structured reports with high accuracy and interpretability. Our study suggests a scalable path to ultrasound intelligence that generalizes across heterogeneous ultrasound tasks and supports consistent end-to-end clinical workflows.