VOLMO: Versatile and Open Large Models for Ophthalmology

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • 失明を防ぐための早期眼科検査に向けて、画像・構造化データ・自由記述を統合する既存MLLMは眼科領域で性能が十分でなく、オープンな眼科特化モデルも乏しいことが背景として示された。
  • VOLMOはモデル非依存かつデータを開放した眼科特化MLLM開発のフレームワークで、(1) 画像—テキスト86,965ペアを用いた眼科知識の事前学習、(2) 12眼疾患のスクリーニング/重症度分類のタスク微調整、(3) 患者症例913件での多段推論までを段階的に行う。
  • コンパクトな2BパラメータのVOLMO-2Bを学習し、InternVL-2BやLLaVA-Medなど複数の強いベースラインと比較した結果、画像記述、疾患スクリーニング/ステージ分類、評価と管理の生成の各タスクで一貫して優位だった。
  • 12疾患での平均F1が87.4%に達し、年齢関連黄斑変性と糖尿病性網膜症について独立コホートでの外部検証でもより高い評価を得たと報告された。
  • 本研究は眼科臨床ワークフローへの多モーダルLLM適用に向けた再現可能な学習パイプラインを提供し、今後の眼科特化モデル開発の“参照実装”になり得る内容となっている。

Abstract

Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing general and medical MLLMs perform poorly in ophthalmology, and few ophthalmology-specific MLLMs are openly available. We present VOLMO (Versatile and Open Large Models for Ophthalmology), a model-agnostic, data-open framework for developing ophthalmology-specific MLLMs. VOLMO includes three stages: ophthalmology knowledge pretraining on 86,965 image-text pairs from 26,569 articles across 82 journals; domain task fine-tuning on 26,929 annotated instances spanning 12 eye conditions for disease screening and severity classification; and multi-step clinical reasoning on 913 patient case reports for assessment, planning, and follow-up care. Using this framework, we trained a compact 2B-parameter MLLM and compared it with strong baselines, including InternVL-2B, LLaVA-Med-7B, MedGemma-4B, MedGemma-27B, and RETFound. We evaluated these models on image description generation, disease screening and staging classification, and assessment-and-management generation, with additional manual review by two healthcare professionals and external validation on three independent cohorts for age-related macular degeneration and diabetic retinopathy. Across settings, VOLMO-2B consistently outperformed baselines, achieving stronger image description performance, an average F1 of 87.4% across 12 eye conditions, and higher scores in external validation.