Uni-Encoder Meets Multi-Encoders: Representation Before Fusion for Brain Tumor Segmentation with Missing Modalities

arXiv cs.CV / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles multimodal MRI brain tumor segmentation when clinical data is missing one or more imaging modalities, which typically harms performance.
  • It introduces UniME, a two-stage heterogeneous architecture that separates representation learning from segmentation to better balance fine-grained structure, cross-modal complementarity, and use of only available modalities.
  • In Stage 1, a single ViT “Uni-Encoder” is pretrained using masked image modeling to build a unified representation robust to missing modalities.
  • In Stage 2, modality-specific CNN “Multi-Encoders” extract high-resolution, multi-scale features, which are fused with the global representation to generate accurate segmentations.
  • Experiments on BraTS 2023 and BraTS 2024 indicate UniME outperforms prior methods in incomplete multimodal settings, and the authors provide code on GitHub.

Abstract

Multimodal MRI offers complementary information for brain tumor segmentation, but clinical scans often lack one or more modalities, which degrades segmentation performance. In this paper, we propose UniME (Uni-Encoder Meets Multi-Encoders), a two-stage heterogeneous method for brain tumor segmentation with missing modalities that reconciles the trade-offs among fine-grained structure capture, cross-modal complementarity modeling, and exploitation of available modalities. The idea is to decouple representation learning from segmentation via a two-stage heterogeneous architecture. Stage 1 pretrains a single ViT Uni-Encoder with masked image modeling to establish a unified representation robust to missing modalities. Stage 2 adds modality-specific CNN Multi-Encoders to extract high-resolution, multi-scale, fine-grained features. We fuse these features with the global representation to produce precise segmentations. Experiments on BraTS 2023 and BraTS 2024 show that UniME outperforms previous methods under incomplete multi-modal scenarios. The code is available at https://github.com/Hooorace-S/UniME