VOLMO: Versatile and Open Large Models for Ophthalmology

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

失明を防ぐための早期眼科検査に向けて、画像・構造化データ・自由記述を統合する既存MLLMは眼科領域で性能が十分でなく、オープンな眼科特化モデルも乏しいことが背景として示された。
VOLMOはモデル非依存かつデータを開放した眼科特化MLLM開発のフレームワークで、(1) 画像—テキスト86,965ペアを用いた眼科知識の事前学習、(2) 12眼疾患のスクリーニング/重症度分類のタスク微調整、(3) 患者症例913件での多段推論までを段階的に行う。
コンパクトな2BパラメータのVOLMO-2Bを学習し、InternVL-2BやLLaVA-Medなど複数の強いベースラインと比較した結果、画像記述、疾患スクリーニング/ステージ分類、評価と管理の生成の各タスクで一貫して優位だった。
12疾患での平均F1が87.4%に達し、年齢関連黄斑変性と糖尿病性網膜症について独立コホートでの外部検証でもより高い評価を得たと報告された。
本研究は眼科臨床ワークフローへの多モーダルLLM適用に向けた再現可能な学習パイプラインを提供し、今後の眼科特化モデル開発の“参照実装”になり得る内容となっている。

Abstract

Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing general and medical MLLMs perform poorly in ophthalmology, and few ophthalmology-specific MLLMs are openly available. We present VOLMO (Versatile and Open Large Models for Ophthalmology), a model-agnostic, data-open framework for developing ophthalmology-specific MLLMs. VOLMO includes three stages: ophthalmology knowledge pretraining on 86,965 image-text pairs from 26,569 articles across 82 journals; domain task fine-tuning on 26,929 annotated instances spanning 12 eye conditions for disease screening and severity classification; and multi-step clinical reasoning on 913 patient case reports for assessment, planning, and follow-up care. Using this framework, we trained a compact 2B-parameter MLLM and compared it with strong baselines, including InternVL-2B, LLaVA-Med-7B, MedGemma-4B, MedGemma-27B, and RETFound. We evaluated these models on image description generation, disease screening and staging classification, and assessment-and-management generation, with additional manual review by two healthcare professionals and external validation on three independent cohorts for age-related macular degeneration and diabetic retinopathy. Across settings, VOLMO-2B consistently outperformed baselines, achieving stronger image description performance, an average F1 of 87.4% across 12 eye conditions, and higher scores in external validation.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

They Did Not Accidentally Make Work the Answer to Who You Are

Dev.to

Welsh government used Copilot for review to justify closing organization

The Register

VOLMO: Versatile and Open Large Models for Ophthalmology

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

I asked my AI agent to design a product launch image. Here's what came back.

They Did Not Accidentally Make Work the Answer to Who You Are

Welsh government used Copilot for review to justify closing organization

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer