Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

arXiv cs.CV / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces Echo-{alpha}, an agentic multimodal reasoning model designed to improve ultrasound interpretation by combining accurate lesion localization with holistic clinical reasoning.
Echo-{alpha} uses an invoke-and-reason framework that coordinates organ-specific detector outputs, integrates them with global visual context, and produces grounded diagnostic decisions rather than relying on detector-only inference.
Training proceeds via a nine-task supervised curriculum followed by sequential reinforcement learning with different reward trade-offs to obtain variants focused on lesion grounding and final diagnosis.
On multi-center renal and breast ultrasound benchmarks (including cross-center tests), Echo-{alpha} outperforms baseline methods on both grounding and diagnosis, with the reported metrics indicating stronger generalization across centers.
The authors argue that agentic multimodal reasoning can convert specialized detectors into verifiable clinical evidence, and they provide a public repository for the work.

Abstract

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning but weak grounding in specialized medical domains. We present Echo-{\alpha}, an agentic multimodal reasoning model for ultrasound interpretation that unifies these strengths within an invoke-and-reason framework. Echo-{\alpha} is trained to coordinate organ-specific detector outputs, integrate them with global visual context, and convert the resulting evidence into grounded diagnostic decisions beyond detector-only inference. This behavior is established through a nine-task supervised curriculum and then refined by sequential reinforcement learning under different reward trade-offs, yielding Echo-{\alpha}-Grounding for lesion anchoring and Echo-{\alpha}-Diagnosis for final diagnosis. On multi-center renal and breast ultrasound benchmarks, Echo-{\alpha} outperforms competitive baselines on both grounding and diagnosis. In particular, on cross-center test sets, Echo-{\alpha}-Grounding attains 56.73%/43.78% F1@0.5 and Echo- {\alpha}-Diagnosis reaches 74.90%/49.20% overall accuracy on renal/breast ultrasound. These results suggest that agentic multimodal reasoning can turn specialized detectors into verifiable clinical evidence, offering a practical route toward ultrasound AI systems that are more accurate, interpretable, and transferable. The repository is at https://github.com/MiliLab/Echo-Alpha.

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Dev.to

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Dev.to

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...

Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

Dev.to

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Key Points

Abstract

Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer