Abstract
Spiking transformers achieve competitive accuracy with conventional transformers while offering 38-57\times energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven O(1/\sqrt{T}) convergence. We derive tight spike-count lower bounds via rate-distortion theory: \varepsilon-approximation requires \Omega(L_f^2 nd/\varepsilon^2) spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions (d_{\text{eff}}=47--89 for CIFAR/ImageNet), explaining why T=4 timesteps suffice despite worst-case T \geq 10{,}000 predictions. We provide concrete design rules with calibrated constants (C=2.3, 95\% CI: [1.9, 2.7]). Experiments on Spikformer, QKFormer, and SpikingResformer across vision and language benchmarks validate predictions with R^2=0.97 (p<0.001). Our framework provides the first principled foundation for neuromorphic transformer design.