REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
arXiv cs.AI / 4/22/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning) is an arXiv framework that aligns retinal color fundus images with individualized, disease-specific risk profiles to predict incident Alzheimer’s disease (AD) and dementia.
- The approach addresses a key gap in prior work by jointly modeling multimodal inputs—retinal morphometric features and structured questionnaire-based risk factors—instead of treating imaging and risk factors separately.
- It converts real-world questionnaire risk factors into clinically interpretable narrative text that can be used with pretrained vision-language models (VLMs), enabling cross-modal learning.
- A group-aware contrastive learning (GACL) strategy is proposed to cluster patients with similar retinal morphometry and risk factors, improving alignment of shared patterns across modalities.
- The method reports substantially better performance than state-of-the-art retinal imaging models with clinical text encoders and than general-purpose VLMs, with predictions occurring on average ~8 years before diagnosis.


