MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MEDIC-AD, a clinically oriented medical vision-language model designed to make lesion detection, symptom tracking, and visual explainability more actionable for real-world clinical use.
  • It uses a stage-wise framework with anomaly-aware tokens (<Ano>) to emphasize abnormal regions, improving lesion-centered representations for more accurate anomaly detection and segmentation.
  • It adds inter-image difference tokens (<Diff>) to encode temporal changes across longitudinal studies, enabling the model to categorize disease trajectory as worsening, improving, or stable.
  • A dedicated explainability stage trains the system to output lesion-focused heatmaps that align with the model’s reasoning and support clinically faithful visual evidence.
  • Experiments on real longitudinal clinical data from hospital workflows show state-of-the-art performance and stable predictions suitable for patient monitoring and decision-support contexts.

Abstract

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens () encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens () explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through our staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed source and medical-specialized baselines. Evaluations on real longitudinal clinical data collected from real hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows