Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces EyeBench-V2 to better evaluate generative models for retinal fundus image enhancement using clinically relevant measures beyond standard PSNR/SSIM.
  • It addresses gaps in evaluation by including a unified protocol for both paired and unpaired enhancement approaches, including methods guided by clinical expertise.
  • EyeBench-V2 adds multi-dimensional downstream evaluations such as vessel segmentation, diabetic retinopathy (DR) grading, lesion segmentation, and robustness to unseen noise patterns.
  • The benchmark includes an expert-curated dataset and a structured medical expert manual assessment to detect clinically critical issues like lesion structure changes, background color shifts, and artificial structure artifacts.
  • The authors aim to provide actionable, task-oriented insights that help researchers choose appropriate models and guide future development toward clinically aligned enhancement systems.

Abstract

Over the past decade, generative models have demonstrated success in enhancing fundus images. However, the evaluation of these models remains a challenge. A benchmark for fundus image enhancement is needed for three main reasons:(1) Conventional denoising metrics such as PSNR and SSIM fail to capture clinically relevant features, such as lesion preservation and vessel morphology consistency, limiting their applicability in real-world settings; (2) There is a lack of unified evaluation protocols that address both paired and unpaired enhancement methods, particularly those guided by clinical expertise; and (3) An evaluation framework should provide actionable insights to guide future advancements in clinically aligned enhancement models. To address these gaps, we introduce EyeBench-V2, a benchmark designed to bridge the gap between enhancement model performance and clinical utility. Our work offers three key contributions:(1) Multi-dimensional clinical-alignment through downstream evaluations: Beyond standard enhancement metrics, we assess performance across clinically meaningful tasks including vessel segmentation, diabetic retinopathy (DR) grading, generalization to unseen noise patterns, and lesion segmentation. (2) Expert-guided evaluation design: We curate a novel dataset enabling fair comparisons between paired and unpaired enhancement methods, accompanied by a structured manual assessment protocol by medical experts, which evaluates clinically critical aspects such as lesion structure alterations, background color shifts, and the introduction of artificial structures. (3) Actionable insights: Our benchmark provides a rigorous, task-oriented analysis of existing generative models, equipping clinical researchers with the evidence needed to make informed decisions, while also identifying limitations in current methods to inform the design of next-generation enhancement models.