CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

CataractSAM-2は、MetaのSAM 2を眼科手術領域向けにドメイン適応したモデルで、白内障手術動画に対するリアルタイムで高精度な意味的セグメンテーションを目指しています。
手術用ロボティクス／コンピュータ支援手術に必要な術中認識を補強することを目的としており、コンピュータビジョンと医療ロボティクスの接点で位置づけられています。
手作業ラベリングの負担を減らすため、疎なプロンプトと動画のマスク伝播を組み合わせた対話型アノテーション枠組みを提案し、高品質なグラウンドトゥルース生成をスケールしやすくしています。
グローコーマ手術（トラベクロトミー）へのゼロショット汎化も示され、手術手技をまたいだ有用性の可能性を示唆しています。
学習済みモデルとアノテーションツールキットをオープンソースとして公開し、前眼部手術データセット拡充と医療AIの実運用開発を促進します。

Abstract

We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Positioned at the intersection of computer vision and medical robotics, CataractSAM-2 enables precise intraoperative perception crucial for robotic-assisted and computer-guided surgical systems. Furthermore, to alleviate the burden of manual labeling, we introduce an interactive annotation framework that combines sparse prompts with video-based mask propagation. This tool significantly reduces annotation time and facilitates the scalable creation of high-quality ground-truth masks, accelerating dataset development for ocular anterior segment surgeries. We also demonstrate the model's strong zero-shot generalization to glaucoma trabeculectomy procedures, confirming its cross-procedural utility and potential for broader surgical applications. The trained model and annotation toolkit are released as open-source resources, establishing CataractSAM-2 as a foundation for expanding anterior ophthalmic surgical datasets and advancing real-time AI-driven solutions in medical robotics, as well as surgical video understanding.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/24DailyView insight →

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

Dev.to

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

Dev.to

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

THE DECODER

MolmoWeb 4B/8B

Reddit r/LocalLLaMA

CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

Key Points

Abstract

💡 Insights using this article

Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

MolmoWeb 4B/8B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer