A Probabilistic Framework for Improving Dense Object Detection in Underwater Image Data via Annealing-Based Data Augmentation

arXiv cs.CV / 4/24/2026

📰 NewsModels & Research

Key Points

  • Underwater object detection often suffers due to highly variable lighting, water clarity, viewpoints, and frequent occlusions compared with controlled environments.
  • The study builds a custom detection dataset from the DeepFish segmentation masks by generating bounding-box annotations, then uses a pseudo-simulated-annealing augmentation method inspired by copy-paste to create realistic crowded fish scenes.
  • The augmentation increases spatial diversity and object density during training, which helps the model generalize better to complex underwater scenes.
  • Experiments show the proposed method significantly outperforms a baseline YOLOv10 model, especially on a hard test set of manually annotated images captured from live-stream footage in the Florida Keys.
  • Overall, the work highlights data augmentation as an effective way to improve dense, real-world underwater detection robustness without changing the underlying detector architecture.

Abstract

Object detection models typically perform well on images captured in controlled environments with stable lighting, water clarity, and viewpoint, but their performance degrades substantially in real-world underwater settings characterized by high variability and frequent occlusions. In this work, we address these challenges by introducing a novel data augmentation framework designed to improve robustness in dense and unconstrained underwater scenes. Using the DeepFish dataset, which contains images of fish in natural environments, we first generate bounding box annotations from provided segmentation masks to construct a custom detection dataset. We then propose a pseudo-simulated annealing-based augmentation algorithm, inspired by the copy-paste strategy of Deng et al. [1], to synthesize realistic crowded fish scenarios. Our approach improves spatial diversity and object density during training, enabling better generalization to complex scenes. Experimental results show that our method significantly outperforms a baseline YOLOv10 model, particularly on a challenging test set of manually annotated images collected from live-stream footage in the Florida Keys. These results demonstrate the effectiveness of our augmentation strategy for improving detection performance in dense, real-world underwater environments.