Linear-Readout Floors and Threshold Recovery in Computation in Superposition

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper compares two recent approaches to computation in superposition—Hanni et al.’s approximate-linear recursive template and Adler & Shavit’s thresholded Boolean recovery—and argues they are consistent because they preserve different interface invariants.
It establishes a Welch-type lower bound for biorthogonal linear readouts, showing that when the number of features F is much larger than the width d, the worst-case off-diagonal cross-talk scales as Ω(d^{-1/2}).
At a quadratic feature load (F = d^2), the authors show random-support threshold recovery can succeed for sparsities s = O(d/log d), whereas linear readouts still suffer average per-coordinate squared error Ω(s/d) on Bernoulli sparse states.
By matching the Welch lower bound to the published tolerance of Hanni’s correction layer, the paper explains why the computable-feature scale d^{3/2} appears as a compatibility threshold for that specific template rather than a universal upper limit.
The authors note that designing robust nonlinear reset methods beyond the Hanni template remains an open problem.
Point 2
Point 3

Abstract

Two recent approaches to computation in superposition reach different recursive capacity regimes: H\"anni et al. certify

\tilde{O}(d^{3/2})

computable features in width

d

via an approximate-linear recursive template, while Adler and Shavit reach near-quadratic capacity (up to logarithmic factors) using thresholded Boolean recovery. The main contribution of this paper is conceptual: we argue these results are not contradictory because they maintain different interface invariants, and we formalize the distinction. As a tool, we record a rank-trace Welch-type lower bound for biorthogonal linear readouts: for

F \gg d

, the worst-case off-diagonal cross-talk of any unit-diagonal linear readout is

\Omega(d^{-1/2})

, and the bound is tight on average for unit-norm tight frames. At quadratic feature load

F=d^2

, random-support threshold recovery succeeds for sparsities

s=O(d/\log d)

, while linear readouts still incur

\Omega(s/d)

average per-coordinate squared error on Bernoulli sparse states. Matching the Welch floor against the published tolerance of the H\"anni correction layer explains the