A Unified Spatial Alignment Framework for Highly Transferable Transformation-Based Attacks on Spatially Structured Tasks

arXiv cs.CV / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper notes that transformation-based adversarial attacks (TAAs) transfer well for image classification but can fail or perform poorly on spatially structured tasks like semantic segmentation and object detection.
  • It argues that the root cause is spatial misalignment: for structured tasks, labels are spatially structured, so applying spatial transformations to inputs without synchronizing label transformations corrupts the training gradients.
  • The authors propose a unified Spatial Alignment Framework (SAF) that spatially transforms labels synchronously with inputs via a Spatial Alignment (SA) algorithm to maintain alignment during attacks.
  • Experiments show SAF is crucial for structured tasks, substantially reducing segmentation mIoU and detection mAP compared to attacks without the framework (e.g., Cityscapes mIoU 24.50→11.34; COCO mAP 17.89→5.25 in the paper’s reported comparisons).

Abstract

Transformation-based adversarial attacks (TAAs) demonstrate strong transferability when deceiving classification models. However, existing TAAs often perform unsatisfactorily or even fail when applied to structured tasks such as semantic segmentation and object detection. Encouragingly, recent studies that categorize transformations into non-spatial and spatial transformations inspire us to address this challenge. We find that for non-structured tasks, labels are spatially non-structured, and thus TAAs are not required to adjust labels when applying spatial transformations. In contrast, for structured tasks, labels are spatially structured, and failing to transform labels synchronously with inputs can cause spatial misalignment and yield erroneous gradients. To address these issues, we propose a novel unified Spatial Alignment Framework (SAF) for highly transferable TAAs on spatially structured tasks, where the TAAs spatially transform labels synchronously with the input using the proposed Spatial Alignment (SA) algorithm. Extensive experiments demonstrate the crucial role of our SAF for TAAs on structured tasks. Specifically, in non-targeted attacks, our SAF degrades the average mIoU on Cityscapes from 24.50 to 11.34, and on Kvasir-SEG from 49.91 to 31.80, while reducing the average mAP of COCO from 17.89 to 5.25.
広告