Label-efficient underwater species classification with semi-supervised learning on frozen foundation model embeddings

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a label-efficient semi-supervised underwater species classifier by running self-training on frozen DINOv3 ViT-B foundation model embeddings, avoiding any embedding fine-tuning.
  • With fewer than 5% of the available labels, the method largely closes the performance gap versus a fully supervised ConvNeXt model trained on all labeled data, and at full label availability it narrows to only a few percentage points.
  • Evaluation on the AQUA20 marine species benchmark shows strong class separability in the frozen embedding space (high ROC-AUC), suggesting discriminative structure is present even when decision boundaries are not yet well estimated.
  • The approach claims practical deployment benefits because it requires no training, no domain-specific data engineering, and no underwater-adapted models, and results are averaged over 100 random seed initializations.
  • The core contribution is demonstrating that semi-supervised learning atop pretrained, frozen representations can reduce expert annotation costs and improve transfer across new underwater conditions.

Abstract

Automated species classification from underwater imagery is bottlenecked by the cost of expert annotation, and supervised models trained on one dataset rarely transfer to new conditions. We investigate whether semi-supervised methods operating on frozen foundation model embeddings can close this annotation gap with minimal labeling effort. Using DINOv3 ViT-B embeddings with no fine-tuning, we propagate a small set of labeled seeds through unlabeled data via nearest-neighbor-based self-training and evaluate on the AQUA20 benchmark (20 marine species). With fewer than 5% of the training labels, self-training on frozen embeddings closes much of the gap to a fully supervised ConvNeXt baseline trained on the entire labeled dataset; at full supervision, the gap narrows to a few percentage points, with several species exceeding the supervised baseline. Class separability in the embedding space, measured by ROC-AUC, is high even at extreme label scarcity, indicating that the frozen representations capture discriminative structure well before decision boundaries can be reliably estimated. Our approach requires no training, no domain-specific data engineering, and no underwater-adapted models, establishing a practical, immediately deployable baseline for label-efficient marine species recognition. All results are reported on the held-out test set over 100 random seed initializations.