Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Mine-JEPA, described as the first in-domain self-supervised learning pipeline specifically for side-scan sonar (SSS) mine classification under extreme data scarcity and a strong domain gap versus natural images.
Using SIGReg regularization-based SSL loss and only 1,170 unlabeled sonar images, Mine-JEPA achieves an F1 score of 0.935 in the binary mine vs. non-mine task, outperforming a fine-tuned DINOv3 baseline.
For a 3-class mine-like object classification task, Mine-JEPA reaches 0.820 with synthetic data augmentation and again surpasses fine-tuned DINOv3.
The study finds that applying in-domain SSL to an already strong foundation model can significantly degrade performance (by about 10–13 percentage points), implying that more pretraining or adaptation is not always beneficial.
The method also demonstrates parameter efficiency: with a compact ViT-Tiny backbone, Mine-JEPA offers competitive results using about 4x fewer parameters than DINOv3, supporting the case for tailored in-domain SSL over larger models in sonar imagery.

Abstract

Side-scan sonar (SSS) mine classification is a challenging maritime vision problem characterized by extreme data scarcity and a large domain gap from natural images. While self-supervised learning (SSL) and general-purpose vision foundation models have shown strong performance in general vision and several specialized domains, their use in SSS remains largely unexplored. We present Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, using SIGReg, a regularization-based SSL loss, to pretrain on only 1,170 unlabeled sonar images. In the binary mine vs. non-mine setting, Mine-JEPA achieves an F1 score of 0.935, outperforming fine-tuned DINOv3 (0.922), a foundation model pretrained on 1.7B images. For 3-class mine-like object classification, Mine-JEPA reaches 0.820 with synthetic data augmentation, again outperforming fine-tuned DINOv3 (0.810). We further observe that applying in-domain SSL to foundation models degrades performance by 10--13 percentage points, suggesting that stronger pretrained models do not always benefit from additional domain adaptation. In addition, Mine-JEPA with a compact ViT-Tiny backbone achieves competitive performance while using 4x fewer parameters than DINOv3. These results suggest that carefully designed in-domain self-supervised learning is a viable alternative to much larger foundation models in data-scarce maritime sonar imagery.

Black Hat Asia

AI Business

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

Key Points

Abstract

Related Articles

Black Hat Asia

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer