PolarMAE: Efficient Fetal Ultrasound Pre-training via Semantic Screening and Polar-Guided Masking

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes PolarMAE, an ultrasound-specific pre-training framework that addresses shortcomings of prior methods that ignore US imaging characteristics like redundancy, polar locality, and beamforming.
  • It introduces Progressive Visual-Semantic Screening (PVSS) to adaptively select high-value samples and reduce continuous-scan redundancy, improving pre-training efficiency.
  • It adds an Acoustic-Bounded Region Constraint (ABRC) to restrict learning to valid acoustic regions, preventing the model from focusing on invalid dark background areas.
  • It designs Polar-Texture Collaborative Masking (PTCM) to leverage beamforming priors and local details, helping the model learn radial imaging patterns and important tissue structures.
  • Experiments across multiple datasets and downstream fetal ultrasound interpretation tasks show state-of-the-art results with strong scalability and efficiency for pre-training.

Abstract

Intelligent fetal ultrasound (US) interpretation is crucial for prenatal diagnosis, but high annotation costs and operator-induced variance make unsupervised pre-training a highly promising paradigm. However, existing pre-training methods largely ignore US-specific characteristics -- severe data redundancy, fan-shaped locality, and polar coordinate beamforming -- limiting their effectiveness in downstream tasks. To address this, we propose PolarMAE, a novel and efficient pre-training framework tailored for US images. Specifically, to mitigate continuous scanning redundancy, we introduce a Progressive Visual-Semantic Screening (PVSS) that adaptively extracts high-value samples, significantly boosting pre-training efficiency. Furthermore, we design an Acoustic-Bounded Region Constraint (ABRC) to accommodate US locality, forcing the model to focus strictly on valid acoustic regions rather than invalid dark backgrounds. Finally, leveraging the beamforming prior and local details, we propose a Polar-Texture Collaborative Masking (PTCM), enabling the model to capture underlying radial imaging patterns and critical tissue structures. Extensive experiments across diverse datasets and downstream interpretation tasks demonstrate that our method achieves state-of-the-art performance with strong pre-training scalability and efficiency.