MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management
arXiv cs.AI / 3/24/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces MARCUS, an agentic multimodal vision-language model designed to interpret cardiac data end-to-end, handling ECGs, echocardiograms, and CMR both individually and together as multimodal inputs.
- MARCUS uses a hierarchical agentic architecture with modality-specific expert vision-language models coordinated by a multimodal orchestrator, combining domain-trained visual encoders with multi-stage language-model optimization.
- Trained on 13.5M images (including ECGs, echocardiograms, and CMR) and a curated dataset of 1.6M questions, MARCUS reports state-of-the-art results and improvements over frontier models on internal (Stanford) and external (UCSF) cohorts.
- Reported accuracies range from 87–91% for ECG to 67–86% for echocardiography and 85–88% for CMR, with multimodal performance reaching 70% accuracy—substantially higher than compared frontier systems.
- The authors claim robustness against “mirage reasoning” (unintended textual or hallucinated visual rationales) and state they are releasing models, code, and benchmarks as open source.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis
Dev.to
AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER