MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

arXiv cs.CV / 4/28/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a data-efficient self-supervised pretraining method for nnFormer-based volumetric medical image segmentation using Masked Autoencoders (MAE).
It addresses the practical issue that transformer segmentation models often require large labeled datasets, risk overfitting, and can be unstable to train, which is costly in medical domains.
The method pretrains the model on abundant unlabeled 3D medical images by reconstructing randomly masked input regions to learn anatomical and structural representations.
The pretrained encoder is then fine-tuned on labeled data for the downstream segmentation task, improving performance (higher Dice scores), convergence speed, and generalization when labeled data is limited.
Overall, the results support self-supervised learning as a suitable approach to mitigate labeled-data scarcity in medical image analysis when paired with transformer-based segmentation architectures like nnFormer.

Abstract

Transformer architectures, including nnFormer,have demonstrated promising results in volumetric medical image segmentation by being able to capture long-range spatial interactions. Although they have high performance, these models need large quantities of labeled training data and are also likely to overfit and become training unstable. This is a serious practical problem because it is not only time-consuming but also expensive to obtain medical images that are annotated by experts. Moreover, fully supervised traditional training pipelines do not take advantage of the available large amounts of unlabeled medical imaging data that can be easily obtained in the clinics. We have solved these drawbacks by advancing the efficiency of the nnFormer with a self-supervised pretraining framework, which is based on the Masked Autoencoders (MAE). In this method, the model is pretrained on unlabeled volumetric medical images to reconstruct randomly masked parts of the input. This allows the encoder to learn meaningful anatomical and structural representations . The encoder is then further fine-tuned on a labeled dataset on the downstream segmentation task. Conducted Experiment shows that the offered method leads to a higher segmentation performance on the count of Dice score, a quicker convergence rate on the course of the fine-tuning procedure, and a superior generalization on the basis of limited labeled data . These findings validate that self-supervised learning combined with transformer-based segmentation models is an appropriate approach to the problem of data shortage in medical image analysis.

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Dev.to

Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Reddit r/LocalLLaMA

Record $1.1B Seed Funding for Reinforcement Learning Startup

AI Business

The One Substrate Failure Behind Every AI System in 2026

Reddit r/artificial

Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived

Nvidia AI Blog

MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

Key Points

Abstract

Related Articles

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Record $1.1B Seed Funding for Reinforcement Learning Startup

The One Substrate Failure Behind Every AI System in 2026

Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer