PoInit-of-View: Poisoning Initialization of Views Transfers Across Multiple 3D Reconstruction Systems

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that poisoning a 3D reconstruction pipeline can be made more transferable by targeting the structure-from-motion (SfM) initialization module rather than backpropagating through the whole system.
  • It introduces PoInit-of-View, which learns adversarial perturbations that create cross-view gradient inconsistencies at corresponding 3D point projections, breaking keypoint detection and feature matching.
  • These disruptions corrupt pose estimation and triangulation in SfM, ultimately producing low-quality rendered views.
  • The authors provide theory linking cross-view inconsistency to correspondence collapse, explaining why the attack works across systems.
  • Experiments report stronger black-box transfer performance, improving over a single-view baseline by 25.1% in PSNR and 16.5% in SSIM (e.g., transferring from 3DGS to NeRF).

Abstract

Poisoning input views of 3D reconstruction systems has been recently studied. However, we identify that existing studies simply backpropagate adversarial gradients through the 3D reconstruction pipeline as a whole, without uncovering the new vulnerability rooted in specific modules of the 3D reconstruction pipeline. In this paper, we argue that the structure-from-motion (SfM) initialization, as the geometric core of many widely used reconstruction systems, can be targeted to achieve transferable poisoning effects across diverse 3D reconstruction systems. To this end, we propose PoInit-of-View, which optimizes adversarial perturbations to intentionally introduce cross-view gradient inconsistencies at projections of corresponding 3D points. These inconsistencies disrupt keypoint detection and feature matching, thereby corrupting pose estimation and triangulation within SfM, eventually resulting in low-quality rendered views. We also provide a theoretical analysis that connects cross-view inconsistency to correspondence collapse. Experimental results demonstrate the effectiveness of our PoInit-of-View on diverse 3D reconstruction systems and datasets, surpassing the single-view baseline by 25.1% in PSNR and 16.5% in SSIM in black-box transfer settings, such as 3DGS to NeRF.