Toward Multimodal Conversational AI for Age-Related Macular Degeneration

arXiv cs.CL / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current deep learning retinal-disease systems often output static predictions and lack interactive clinical reasoning and explanations.
  • It introduces OcularChat, an MLLM fine-tuned from Qwen2.5-VL using simulated patient–physician dialogues to perform visual question answering on color fundus photographs for diagnosing age-related macular degeneration (AMD).
  • Training uses 705,850 simulated dialogues paired with 46,167 fundus images so the model can identify key AMD features and generate reasoned predictions.
  • Experiments on AREDS and AREDS2 show strong classification accuracy and state that OcularChat outperforms existing MLLMs, including higher average ophthalmologist grading across multiple tasks and overall impression.
  • The results suggest multimodal conversational AI could provide accurate, interpretable, and clinically useful image-based AMD diagnosis with interactive explanation capabilities.

Abstract

Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clinical decision-making and patient counseling. In this study, OcularChat, an MLLM, was fine-tuned from Qwen2.5-VL using simulated patient-physician dialogues to diagnose age-related macular degeneration (AMD) through visual question answering on color fundus photographs (CFPs). A total of 705,850 simulated dialogues paired with 46,167 CFPs were generated to train OcularChat to identify key AMD features and produce reasoned predictions. OcularChat demonstrated strong classification performance in AREDS, achieving accuracies of 0.954, 0.849, and 0.678 for the three diagnostic tasks: advanced AMD, pigmentary abnormalities, and drusen size, significantly outperforming existing MLLMs. On AREDS2, OcularChat remained the top-performing method on all tasks. Across three independent ophthalmologist graders, OcularChat achieved higher mean scores than a strong baseline model for advanced AMD (3.503 vs. 2.833), pigmentary abnormalities (3.272 vs. 2.828), drusen size (3.064 vs. 2.433), and overall impression (2.978 vs. 2.464) on a 5-point clinical grading rubric. Beyond strong objective performance in AMD severity classification, OcularChat demonstrated the ability to provide diagnostic reasoning, clinically relevant explanations, and interactive dialogue, with high performance in subjective ophthalmologist evaluation. These findings suggest that MLLMs may enable accurate, interpretable, and clinically useful image-based diagnosis and classification of AMD.