VAMAE: Vessel-Aware Masked Autoencoders for OCT Angiography

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VAMAE, a vessel-aware masked autoencoder framework tailored for OCT angiography (OCTA) representation learning where vessel structures are sparse and constrained by vascular topology.
  • Unlike standard masked autoencoders that use uniform masking and pixel-level reconstruction for natural images, VAMAE uses anatomically informed masking guided by vesselness and skeleton cues to emphasize vessel-rich areas and connectivity patterns.
  • VAMAE’s pretraining uses a multi-target reconstruction objective to capture complementary aspects of OCTA imagery, including appearance, structural, and topological information.
  • Experiments on the OCTA-500 benchmark across multiple vessel segmentation tasks show consistent gains over conventional masked autoencoding baselines, especially when labeled data is limited.
  • The authors argue the results support geometry-aware self-supervised learning as a promising direction for more robust OCTA analysis in data-scarce settings.

Abstract

Optical coherence tomography angiography (OCTA) provides non-invasive visualization of retinal microvasculature, but learning robust representations remains challenging due to sparse vessel structures and strong topological constraints. Many existing self-supervised learning approaches, including masked autoencoders, are primarily designed for dense natural images and rely on uniform masking and pixel-level reconstruction, which may inadequately capture vascular geometry. We propose VAMAE, a vessel-aware masked autoencoding framework for self-supervised pretraining on OCTA images. The approach incorporates anatomically informed masking that emphasizes vessel-rich regions using vesselness and skeleton-based cues, encouraging the model to focus on vascular connectivity and branching patterns. In addition, the pretraining objective includes reconstructing multiple complementary targets, enabling the model to capture appearance, structural, and topological information. We evaluate the proposed pretraining strategy on the OCTA-500 benchmark for several vessel segmentation tasks under varying levels of supervision. The results indicate that vessel-aware masking and multi-target reconstruction provide consistent improvements over standard masked autoencoding baselines, particularly in limited-label settings, suggesting the potential of geometry-aware self-supervised learning for OCTA analysis.