MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

arXiv cs.CV / 3/27/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The paper argues that current evaluations of medical vision-language models oversimplify clinical practice by using curated 2D images rather than requiring agents to explore full 3D, multi-sequence/multi-modality studies.
It proposes MEDOPENCLAW, an auditable runtime that enables VLM-based agents to operate dynamically inside standard medical viewers/tools such as 3D Slicer.
It introduces MEDFLOWBENCH, a full-study benchmark for multi-sequence brain MRI and lung CT/PET that compares agentic performance across viewer-only, tool-use, and open-method settings.
Initial results show a performance paradox: strong LLM/VLMs can complete basic study navigation in viewer-only mode, but degrade when given access to professional support tools, attributed to insufficient precise spatial grounding.
The authors position MEDOPENCLAW and MEDFLOWBENCH as a reproducible foundation for building and evaluating auditable, interactive medical imaging agents.

Abstract

Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigate full 3D volumes across multiple sequences or modalities to gather evidence and ultimately support a final decision. To address this, we propose MEDOPENCLAW, an auditable runtime designed to let VLMs operate dynamically within standard medical tools or viewers (e.g., 3D Slicer). On top of this runtime, we introduce MEDFLOWBENCH, a full-study medical imaging benchmark covering multi-sequence brain MRI and lung CT/PET. It systematically evaluates medical agentic capabilities across viewer-only, tool-use, and open-method tracks. Initial results reveal a critical insight: while state-of-the-art LLMs/VLMs (e.g., Gemini 3.1 Pro and GPT-5.4) can successfully navigate the viewer to solve basic study-level tasks, their performance paradoxically degrades when given access to professional support tools due to a lack of precise spatial grounding. By bridging the gap between static-image perception and interactive clinical workflows, MEDOPENCLAW and MEDFLOWBENCH establish a reproducible foundation for developing auditable, full-study medical imaging agents.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/27DailyView insight →

[Boost]

Dev.to

Managing LLM context in a real application

Dev.to

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

Dev.to

OpenAI Killed Sora — Here's Your 10-Minute Migration Guide (Free API)

Dev.to

Switching my AI voice agent from WebSocket to WebRTC — what broke and what I learned

Dev.to

MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

Key Points

Abstract

💡 Insights using this article

Related Articles

[Boost]

Managing LLM context in a real application

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.

OpenAI Killed Sora — Here's Your 10-Minute Migration Guide (Free API)

Switching my AI voice agent from WebSocket to WebRTC — what broke and what I learned

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer