Addressing Data Scarcity in 3D Trauma Detection through Self-Supervised and Semi-Supervised Learning with Vertex Relative Position Encoding

arXiv cs.CV / 3/16/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a label-efficient approach combining self-supervised pre-training (masked image modeling) with semi-supervised detection for 3D medical imaging, addressing scarce annotated data.
It pre-trains a 3D U-Net encoder on 1,206 CT volumes without annotations, learning robust anatomical representations via patch-based MIM.
Downstream tasks include 3D injury detection using VDETR with Vertex Relative Position Encoding and multi-label injury classification, achieving 56.57% val mAP@0.50 and 45.30% test mAP@0.50 with 144 labeled samples, a 115% improvement over supervised training.
For classification, with 2,244 labeled samples, the model attains 94.07% test accuracy across seven injury categories using a frozen encoder, showing transferability of self-supervised features.

Abstract

Accurate detection and localization of traumatic injuries in abdominal CT scans remains a critical challenge in emergency radiology, primarily due to severe scarcity of annotated medical data. This paper presents a label-efficient approach combining self-supervised pre-training with semi-supervised detection for 3D medical image analysis. We employ patch-based Masked Image Modeling (MIM) to pre-train a 3D U-Net encoder on 1,206 CT volumes without annotations, learning robust anatomical representations. The pretrained encoder enables two downstream clinical tasks: 3D injury detection using VDETR with Vertex Relative Position Encoding, and multi-label injury classification. For detection, semi-supervised learning with 2,000 unlabeled volumes and consistency regularization achieves 56.57% validation mAP@0.50 and 45.30% test mAP@0.50 with only 144 labeled training samples, representing a 115% improvement over supervised-only training. For classification, expanding to 2,244 labeled samples yields 94.07% test accuracy across seven injury categories using only a frozen encoder, demonstrating immediately transferable self-supervised features. Our results validate that self-supervised pre-training combined with semi-supervised learning effectively addresses label scarcity in medical imaging, enabling robust 3D object detection with limited annotations.

[D] Matryoshka Representation Learning

Reddit r/MachineLearning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

SCMP Tech

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

MarkTechPost

Streaming experts

Simon Willison's Blog

Addressing Data Scarcity in 3D Trauma Detection through Self-Supervised and Semi-Supervised Learning with Vertex Relative Position Encoding

Key Points

Abstract

Related Articles

[D] Matryoshka Representation Learning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

Streaming experts

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer