Physical Adversarial Attacks on AI Surveillance Systems:Detection, Tracking, and Visible--Infrared Evasion

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that physical adversarial attacks should be evaluated in surveillance-like settings where detection, multi-object tracking, and visible–infrared sensing interact over time.
  • It explains why per-frame RGB results can be misleading for real systems, especially for night-time or dual-modal (visible + thermal) deployments.
  • The review emphasizes key technical dimensions—temporal persistence, sensing modality, realism of the physical attack carrier, and system-level attack objectives—organized into a four-part taxonomy.
  • It discusses how recent work on multi-object tracking evasion, dual-modal visible–infrared attacks, and controllable clothing illustrates a shift in how the field should interpret robustness.
  • It highlights unresolved evaluation gaps such as robustness to distance and camera-pipeline variation, the need for identity-level metrics, and testing that accounts for activation-aware threats.

Abstract

Physical adversarial attacks are increasingly studied in settings that resemble deployed surveillance systems rather than isolated image benchmarks. In these settings, person detection, multi-object tracking, visible--infrared sensing, and the practical form of the attack carrier all matter at once. This changes how the literature should be read. A perturbation that suppresses a detector in one frame may have limited practical effect if identity is recovered over time; an RGB-only result may say little about night-time systems that rely on visible and thermal inputs together; and a conspicuous patch can imply a different threat model from a wearable or selectively activated carrier. This paper reviews physical attacks from that surveillance-oriented viewpoint. Rather than attempting a complete catalogue of all physical attacks in computer vision, we focus on the technical questions that become central in surveillance: temporal persistence, sensing modality, carrier realism, and system-level objective. We organize prior work through a four-part taxonomy and discuss how recent results on multi-object tracking, dual-modal visible--infrared evasion, and controllable clothing reflect a broader change in the field. We also summarize evaluation practices and unresolved gaps, including distance robustness, camera-pipeline variation, identity-level metrics, and activation-aware testing. The resulting picture is that surveillance robustness cannot be judged reliably from isolated per-frame benchmarks alone; it has to be examined as a system problem unfolding over time, across sensors, and under realistic physical deployment constraints.