MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management

arXiv cs.AI / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces MARCUS, an agentic multimodal vision-language model designed to interpret cardiac data end-to-end, handling ECGs, echocardiograms, and CMR both individually and together as multimodal inputs.
MARCUS uses a hierarchical agentic architecture with modality-specific expert vision-language models coordinated by a multimodal orchestrator, combining domain-trained visual encoders with multi-stage language-model optimization.
Trained on 13.5M images (including ECGs, echocardiograms, and CMR) and a curated dataset of 1.6M questions, MARCUS reports state-of-the-art results and improvements over frontier models on internal (Stanford) and external (UCSF) cohorts.
Reported accuracies range from 87–91% for ECG to 67–86% for echocardiography and 85–88% for CMR, with multimodal performance reaching 70% accuracy—substantially higher than compared frontier systems.
The authors claim robustness against “mirage reasoning” (unintended textual or hallucinated visual rationales) and state they are releasing models, code, and benchmarks as open source.

Abstract

Cardiovascular disease remains the leading cause of global mortality, with progress hindered by human interpretation of complex cardiac tests. Current AI vision-language models are limited to single-modality inputs and are non-interactive. We present MARCUS (Multimodal Autonomous Reasoning and Chat for Ultrasound and Signals), an agentic vision-language system for end-to-end interpretation of electrocardiograms (ECGs), echocardiograms, and cardiac magnetic resonance imaging (CMR) independently and as multimodal input. MARCUS employs a hierarchical agentic architecture comprising modality-specific vision-language expert models, each integrating domain-trained visual encoders with multi-stage language model optimization, coordinated by a multimodal orchestrator. Trained on 13.5 million images (0.25M ECGs, 1.3M echocardiogram images, 12M CMR images) and our novel expert-curated dataset spanning 1.6 million questions, MARCUS achieves state-of-the-art performance surpassing frontier models (GPT-5 Thinking, Gemini 2.5 Pro Deep Think). Across internal (Stanford) and external (UCSF) test cohorts, MARCUS achieves accuracies of 87-91% for ECG, 67-86% for echocardiography, and 85-88% for CMR, outperforming frontier models by 34-45% (P<0.001). On multimodal cases, MARCUS achieved 70% accuracy, nearly triple that of frontier models (22-28%), with 1.7-3.0x higher free-text quality scores. Our agentic architecture also confers resistance to mirage reasoning, whereby vision-language models derive reasoning from unintended textual signals or hallucinated visual content. MARCUS demonstrates that domain-specific visual encoders with an agentic orchestrator enable multimodal cardiac interpretation. We release our models, code, and benchmark open-source.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/24DailyView insight →

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

Dev.to

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

Dev.to

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

Dev.to

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

THE DECODER

MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management

Key Points

Abstract

💡 Insights using this article

Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer