Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Mine-JEPA, described as the first in-domain self-supervised learning pipeline specifically for side-scan sonar (SSS) mine classification under extreme data scarcity and a strong domain gap versus natural images.
  • Using SIGReg regularization-based SSL loss and only 1,170 unlabeled sonar images, Mine-JEPA achieves an F1 score of 0.935 in the binary mine vs. non-mine task, outperforming a fine-tuned DINOv3 baseline.
  • For a 3-class mine-like object classification task, Mine-JEPA reaches 0.820 with synthetic data augmentation and again surpasses fine-tuned DINOv3.
  • The study finds that applying in-domain SSL to an already strong foundation model can significantly degrade performance (by about 10–13 percentage points), implying that more pretraining or adaptation is not always beneficial.
  • The method also demonstrates parameter efficiency: with a compact ViT-Tiny backbone, Mine-JEPA offers competitive results using about 4x fewer parameters than DINOv3, supporting the case for tailored in-domain SSL over larger models in sonar imagery.

Abstract

Side-scan sonar (SSS) mine classification is a challenging maritime vision problem characterized by extreme data scarcity and a large domain gap from natural images. While self-supervised learning (SSL) and general-purpose vision foundation models have shown strong performance in general vision and several specialized domains, their use in SSS remains largely unexplored. We present Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, using SIGReg, a regularization-based SSL loss, to pretrain on only 1,170 unlabeled sonar images. In the binary mine vs. non-mine setting, Mine-JEPA achieves an F1 score of 0.935, outperforming fine-tuned DINOv3 (0.922), a foundation model pretrained on 1.7B images. For 3-class mine-like object classification, Mine-JEPA reaches 0.820 with synthetic data augmentation, again outperforming fine-tuned DINOv3 (0.810). We further observe that applying in-domain SSL to foundation models degrades performance by 10--13 percentage points, suggesting that stronger pretrained models do not always benefit from additional domain adaptation. In addition, Mine-JEPA with a compact ViT-Tiny backbone achieves competitive performance while using 4x fewer parameters than DINOv3. These results suggest that carefully designed in-domain self-supervised learning is a viable alternative to much larger foundation models in data-scarce maritime sonar imagery.