Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the first comprehensive expressivity theory for spiking self-attention, showing that spiking attention with Leaky Integrate-and-Fire neurons can universally approximate continuous permutation-equivariant functions.
It provides explicit spike-circuit constructions, including a novel lateral inhibition network that implements softmax normalization with provable convergence of order O(1/√T).
Using rate-distortion theory, the authors derive tight lower bounds on required spike counts for ε-approximation, showing a dependence of Ω(L_f^2 * n * d / ε^2) on task and approximation parameters.
The key insight is that the required number of timesteps depends on input-dependent “effective dimension,” with measured values d_eff = 47–89 (CIFAR/ImageNet) explaining why only T=4 timesteps can suffice in practice.
Experiments across Spikformer, QKFormer, and SpikingResformer for vision and language tasks support the theory, reporting strong fit (R^2=0.97, p<0.001) and calibrated design constants (C=2.3 with 95% CI [1.9, 2.7]).

Abstract

Spiking transformers achieve competitive accuracy with conventional transformers while offering

38

57\times

energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven

O(1/\sqrt{T})

convergence. We derive tight spike-count lower bounds via rate-distortion theory:

\varepsilon

-approximation requires

\Omega(L_f^2 nd/\varepsilon^2)

spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions (

d_{\text{eff}}=47

89

for CIFAR/ImageNet), explaining why

T=4

timesteps suffice despite worst-case

T \geq 10{,}000

predictions. We provide concrete design rules with calibrated constants (

C=2.3

, 95\% CI:

[1.9, 2.7]

). Experiments on Spikformer, QKFormer, and SpikingResformer across vision and language benchmarks validate predictions with

R^2=0.97

(

p<0.001

). Our framework provides the first principled foundation for neuromorphic transformer design.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer