X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

arXiv cs.CV / 4/23/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces X-PCR, a new benchmark to evaluate how well multi-modal large language models (MLLMs) perform progressive clinical reasoning for ophthalmic diagnosis across a full workflow.
X-PCR includes two reasoning tasks: a six-stage progressive reasoning chain (from image quality assessment to clinical decision-making) and a cross-modality task that integrates six imaging modalities.
The benchmark contains 26,415 images and 177,868 expert-verified VQA pairs covering 52 ophthalmic diseases, curated from 51 public datasets.
Testing 21 MLLMs shows notable deficiencies in both progressive reasoning and cross-modal integration, indicating important gaps before clinical-ready deployment.
The dataset and code are released publicly via the provided GitHub repository, enabling reproducible research and further benchmarking.

Abstract

Despite significant progress in Multi-modal Large Language Models (MLLMs), their clinical reasoning capacity for multi-modal diagnosis remains largely unexamined. Current benchmarks, mostly single-modality data, can't evaluate progressive reasoning and cross-modal integration essential for clinical practice. We introduce the Cross-Modality Progressive Clinical Reasoning (X-PCR) benchmark, the first comprehensive evaluation of MLLMs through a complete ophthalmology diagnostic workflow, with two reasoning tasks: 1) a six-stage progressive reasoning chain spanning image quality assessment to clinical decision-making, and 2) a cross-modality reasoning task integrating six imaging modalities. The benchmark comprises 26,415 images and 177,868 expert-verified VQA pairs curated from 51 public datasets, covering 52 ophthalmic diseases. Evaluation of 21 MLLMs reveals critical gaps in progressive reasoning and cross-modal integration. Dataset and code: https://github.com/CVI-SZU/X-PCR.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

10 AI Tools Every Developer Should Try in 2026

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

Dev.to

X-PCR: A Benchmark for Cross-modality Progressive Clinical Reasoning in Ophthalmic Diagnosis

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

10 AI Tools Every Developer Should Try in 2026

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer