X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

arXiv cs.CV / 4/23/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces X-PCR, a new benchmark to evaluate how well multi-modal large language models (MLLMs) perform progressive clinical reasoning for ophthalmic diagnosis across a full workflow.
  • X-PCR includes two reasoning tasks: a six-stage progressive reasoning chain (from image quality assessment to clinical decision-making) and a cross-modality task that integrates six imaging modalities.
  • The benchmark contains 26,415 images and 177,868 expert-verified VQA pairs covering 52 ophthalmic diseases, curated from 51 public datasets.
  • Testing 21 MLLMs shows notable deficiencies in both progressive reasoning and cross-modal integration, indicating important gaps before clinical-ready deployment.
  • The dataset and code are released publicly via the provided GitHub repository, enabling reproducible research and further benchmarking.

Abstract

Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity for multi-modal diagnosis remains largely unexamined. Current benchmarks, mostly single-modality data, can't evaluate progressive reasoning and cross-modal integration essential for clinical practice. We introduce the Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation of MLLMs through a complete ophthalmology diagnostic workflow, with two reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) a cross-modality reasoning task integrating six imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Evaluation of 21 MLLMs reveals critical gaps in progressive reasoning and cross-modal integration. Dataset and code: https://github.com/CVI-SZU/X-PCR.