Agentic Large Language Models for Training-Free Neuro-Radiological Image Analysis

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper addresses a key limitation of current LLM-based vision QA: they typically lack native 3D spatial reasoning needed for direct analysis of volumetric medical images like CT and MRI.
  • It proposes a training-free, agentic pipeline where LLMs orchestrate external domain-specific tools to perform end-to-end brain MRI workflows, including preprocessing, pathology segmentation, and volumetric analysis.
  • The authors validate the approach on multiple LLMs (GPT-5.1, Gemini 3 Pro, and Claude Sonnet 4.5) using off-the-shelf neuro-radiology tools, and test it across tasks that increase in complexity, including longitudinal multi-timepoint response assessment.
  • They study architectural choices by comparing single-agent setups against multi-agent “domain-expert” collaborations to evaluate how design affects performance.
  • To enable rigorous evaluation of future agentic systems, they release a benchmark dataset of image-prompt-answer tuples derived from public BraTS data.

Abstract

State-of-the-art large language models (LLMs) show high performance in general visual question answering. However, a fundamental limitation remains: current architectures lack the native 3D spatial reasoning required for direct analysis of volumetric medical imaging, such as CT or MRI. Emerging agentic AI offers a new solution, eliminating the need for intrinsic 3D processing by enabling LLMs to orchestrate and leverage specialized external tools. Yet, the feasibility of such agentic frameworks in complex, multi-step radiological workflows remains underexplored. In this work, we present a training-free agentic pipeline for automated brain MRI analysis. Validating our methodology on several LLMs (GPT-5.1, Gemini 3 Pro, Claude Sonnet 4.5) with off-the-shelf domain-specific tools, our system autonomously executes complex end-to-end workflows, including preprocessing (skull stripping, registration), pathology segmentation (glioma, meningioma, metastases), and volumetric analysis. We evaluate our framework across increasingly complex radiological tasks, from single-scan segmentation and volumetric reporting to longitudinal response assessment requiring multi-timepoint comparisons. We analyze the impact of architectural design by comparing single-agent models against multi-agent "domain-expert" collaborations. Finally, to support rigorous evaluation of future agentic systems, we introduce and release a benchmark dataset of image-prompt-answer tuples derived from public BraTS data. Our results demonstrate that agentic AI can solve highly neuro-radiological image analysis tasks through tool use without the need for training or fine-tuning.