GPAFormer: Graph-guided Patch Aggregation Transformer for Efficient 3D Medical Image Segmentation

arXiv cs.CV / 4/9/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces GPAFormer, a lightweight transformer-based architecture aimed at efficient and accurate 3D medical image segmentation across multiple modalities and organs.
GPAFormer’s design centers on two modules: MASA (multi-scale attention-guided stacked aggregation) for handling structures at different sizes, and MPGA (mutual-aware patch graph aggregator) for graph-guided aggregation using patch feature similarity and spatial adjacency.
Experiments on public whole-body CT/MRI datasets (BTCV, Synapse, ACDC, BraTS) report state-leading segmentation performance while using only 1.81M parameters.
The reported accuracy includes DSC improvements such as 75.70% on BTCV, 81.20% on Synapse, 89.32% on ACDC, and 82.74% on BraTS, indicating strong balance between performance and compactness.
The method is presented as practical for real settings, with sub-second inference on a consumer GPU for a validation case in BTCV, targeting resource-constrained clinical environments.

Abstract

Deep learning has been widely applied to 3D medical image segmentation tasks. However, due to the diversity of imaging modalities, the high-dimensional nature of the data, and the heterogeneity of anatomical structures, achieving both segmentation accuracy and computational efficiency in multi-organ segmentation remains a challenge. This study proposed GPAFormer, a lightweight network architecture specifically designed for 3D medical image segmentation, emphasizing efficiency while keeping high accuracy. GPAFormer incorporated two core modules: the multi-scale attention-guided stacked aggregation (MASA) and the mutual-aware patch graph aggregator (MPGA). MASA utilized three parallel paths with different receptive fields, combined through planar aggregation, to enhance the network's capability in handling structures of varying sizes. MPGA employed a graph-guided approach to dynamically aggregate regions with similar feature distributions based on inter-patch feature similarity and spatial adjacency, thereby improving the discrimination of both internal and boundary structures of organs. Experiments were performed on public whole-body CT and MRI datasets including BTCV, Synapse, ACDC, and BraTS. Compared to the existed 3D segmentation networkd, GPAFormer using only 1.81 M parameters achieved overall highest DSC on BTCV (75.70%), Synapse (81.20%), ACDC (89.32%), and BraTS (82.74%). Using consumer level GPU, the inference time for one validation case of BTCV spent less than one second. The results demonstrated that GPAFormer balanced accuracy and efficiency in multi-organ, multi-modality 3D segmentation tasks across various clinical scenarios especially for resource-constrained and time-sensitive clinical environments.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Frontend Engineers Are Becoming AI Trainers

Dev.to

GPAFormer: Graph-guided Patch Aggregation Transformer for Efficient 3D Medical Image Segmentation

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer