Retinal Disease Classification from Fundus Images using CNN Transfer Learning

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a reproducible deep learning pipeline to classify binary retinal disease risk from publicly available fundus images using CNNs and evaluation on held-out data.
  • A transfer learning approach with a pretrained VGG16 backbone is compared against a baseline CNN, focusing on generalization performance.
  • To mitigate class imbalance, the authors apply class weighting and report multiple metrics including accuracy, precision, recall, F1-score, confusion matrices, and ROC-AUC.
  • The VGG16 transfer learning model reaches 90.8% test accuracy and a weighted F1-score of 0.90, outperforming the baseline CNN’s 83.1% accuracy.
  • The study highlights persistent challenges—especially sensitivity for minority disease cases—and discusses practical issues like dataset characteristics and threshold selection for more clinically reliable screening.

Abstract

Retinal diseases remain among the leading preventable causes of visual impairment worldwide. Automated screening based on fundus image analysis has the potential to expand access to early detection, particularly in underserved populations. This paper presents a reproducible deep learning pipeline for binary retinal disease risk classification from publicly available fundus photographs. We implement and compare a baseline convolutional neural network with a transfer learning approach using a pretrained VGG16 backbone and evaluate generalization on held-out data. To address class imbalance, we apply class weighting and report standard classification metrics including accuracy, precision, recall, F1-score, confusion matrices, and ROC-AUC. The VGG16 transfer learning model achieves 90.8% test accuracy with a weighted F1-score of 0.90, substantially outperforming the baseline CNN (83.1% accuracy). Results indicate that transfer learning improves discrimination compared to a baseline CNN, while also revealing remaining challenges in sensitivity to minority disease cases. We discuss practical limitations related to dataset characteristics, class imbalance, and threshold selection, and provide guidance for reproducibility and future improvements for clinically reliable screening