3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance

arXiv cs.CV / 5/7/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The study proposes generating CT-like pseudo-CT volumes from 3D ultrasound (UD-pCT) to reduce reliance on ionizing radiation from conventional CT while addressing ultrasound’s operator dependence and limited tissue quantification.
  • It uses paired 3D kidney ultrasound and CT scans from the TRUSTED dataset, aligned via landmark-based multimodal registration, to create supervised training data for an adversarial learning framework.
  • The core model, Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D), combines a 3D residual encoder-decoder with a transformer bottleneck to capture both local anatomical detail and long-range 3D dependencies.
  • A 3D Conditional PatchGAN discriminator is introduced to improve local structural realism in the synthesized pseudo-CT volumes, and experiments report improved PSNR/SSIM versus established baselines.
  • The authors emphasize potential real-time anatomical reference for operator guidance to lower acquisition variability and unnecessary CT exams, while noting a key limitation: the paired dataset size may restrict generalizability.

Abstract

Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generates CT-like anatomical reference volumes inferred from US, without aiming to reproduce physically accurate Hounsfield Units. Paired 3D kidney US and CT volumes from the TRUSTED dataset are first spatially aligned using a landmark-based multimodal registration pipeline, creating high-quality paired inputs for supervised training of an adversarial framework. The proposed Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) model employs a 3D residual encoder-decoder generator augmented with a transformer bottleneck, enabling effective modeling of fine-grained local anatomical structures as well as long-range volumetric dependencies, while a 3D Conditional PatchGAN discriminator enforces local structural realism in the synthesized pseudo-CT volumes. Quantitative evaluation using PSNR and SSIM demonstrates that the proposed method outperforms established baselines in structural fidelity and perceptual image quality. The UD-pCT volumes provide real-time anatomical reference for operator guidance, potentially reducing acquisition variability and unnecessary CT use. A limitation of this study is the relatively small paired dataset, which may limit the generalizability of the proposed model.