Adjoint Inversion Reveals Holographic Superposition and Destructive Interference in CNN Classifiers

arXiv cs.CV / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a hallucination-free CNN inversion framework that uses magnitude–phase decoupling and Local Adjoint Correctors to ensure reconstruction gradients come only from truly active channels.
  • Using this geometric probe, the authors provide pixel-level evidence that vision encoders exhibit strong holographic superposition across channels, where positive and negative per-channel reconstructions are visually and energetically indistinguishable.
  • The study shows that classification results from destructive interference: classifier weights cancel a shared background direction in pixel space while constructing class-discriminative residuals, directly refuting the Spatial Funnel Hypothesis.
  • The authors connect the required channel set to an interference subspace volume and prove it is dual to the GAP covariance determinant, enabling a covariance-volume channel selection algorithm with a (1−1/e) approximation guarantee.
  • The framework is shown to extend to attention-based heads without retraining and characterizes out-of-distribution failures as a measurable collapse of the covariance volume needed for interference-based classification.

Abstract

A foundational assumption in CNN interpretability -- that deep encoders suppress background pixels while classifiers merely select from a cleaned feature pool (the Spatial Funnel Hypothesis) -- remains untested due to spatial hallucinations in existing visualization tools. We address this by introducing a hallucination-free inversion framework built on magnitude-phase decoupling and Local Adjoint Correctors. Our method mathematically guarantees that the spatial gradient support of every reconstruction stems strictly from genuinely active channels. Using this framework as a geometric probe, we uncover the first pixel-level evidence of strong superposition in vision encoders. We show that per-channel inversions are uniformly holographic: positive and negative weight reconstructions are visually and energetically indistinguishable. However, their algebraic sum sharply concentrates on the foreground. This proves classification operates via destructive interference -- classifier weights cancel a shared background direction in pixel space and constructively assemble class-discriminative residuals, directly falsifying the Spatial Funnel Hypothesis. This interference model identifies the volume of the admissible interference subspace as the geometric quantity governing channel requirements. We prove this volume is dual to the GAP covariance determinant, yielding a covariance-volume channel selection algorithm with a (1-1/e) approximation guarantee. This algorithm mathematically reveals out-of-distribution (OOD) failure as a measurable collapse of the covariance volume essential for interference-based classification. Our framework extends seamlessly to attention-based heads without retraining.