Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Fundus-R1 is a reasoning-enhanced multimodal LLM for fundus image reading, trained entirely on public datasets to reduce the reproducibility and access barriers of prior clinically paired training data.
- The approach uses a RAG-based mechanism to automatically generate image-specific, knowledge-aware reasoning traces that connect visual findings to ophthalmic knowledge grounded in available labels.
- To improve reasoning reliability, the paper enhances RLVR by adding a process reward that promotes self-consistency of the generated reasoning trace across rollouts.
- Experiments on FunBench, Omni-Fundus, and GMAI-Fundus report that Fundus-R1 outperforms baselines, including a generic model (Qwen2.5-VL) and variants that were post-trained without the generated reasoning traces.
- The work suggests a feasible pathway for building stronger fundus-reading MLLMs using public data rather than inaccessible in-house clinical samples.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
My Bestie Built a Free MCP Server for Job Search — Here's How It Works
Dev.to
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial