Robustness of Transformer-Based Fluence Map Prediction Under Clinically Realistic Perturbations

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study evaluates how a two-stage transformer pipeline for IMRT fluence map prediction holds up under clinically realistic distribution shifts and perturbations.
  • It compares different transformer attention designs (fluence-stage transformer backbones including hierarchical, global, and hybrid attention) trained with a physics-informed loss that enforces energy consistency.
  • Robustness degrades smoothly under moderate geometric/radiometric changes, but the model can experience sharp failures under severe rotations and strong noise.
  • Hierarchical transformers such as SwinUNETR show more favorable behavior, with slower increases in upper-quartile energy error under perturbations.
  • The authors find that relying on SSIM alone is insufficient for clinically meaningful error assessment, motivating physics-informed evaluation metrics beyond image similarity.

Abstract

Learning-based fluence map prediction offers a fast alternative to iterative inverse planning in intensity-modulated radiation therapy (IMRT), but its robustness under realistic distribution shifts remains unclear. We study a two-stage transformer pipeline that maps anatomy (CT and contours) to dose and then to beamlet fluence maps. We compare fluence-stage transformer backbones with hierarchical, global, and hybrid attention, trained with a physics-informed loss enforcing energy consistency. Robustness is evaluated under geometric perturbations, radiometric noise, reduced training data, and domain shifts using a prostate IMRT dataset, with additional evaluation of the dose stage on public datasets. Results show smooth degradation under moderate perturbations but sharp failures under severe rotations and noise. Hierarchical transformers (e.g., SwinUNETR) exhibit slower growth in upper-quartile energy error, indicating improved robustness. We further show that SSIM alone fails to capture clinically relevant errors, highlighting the need for physics-informed evaluation.