AI Navigate

Neural Dynamics Self-Attention for Spiking Transformers

arXiv cs.AI / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes integrating Spiking Neural Networks with Transformer architectures and identifies two key limitations in Spiking Transformers: a performance gap compared with artificial neural networks and high memory overhead during inference, attributed to Spiking Self-Attention.
  • It proposes LRF-Dyn, which imposes localized receptive fields on spiking neurons to emphasize neighboring regions and strengthen local modeling while reducing memory usage.
  • It further removes the need to store large attention matrices by approximating attention with charge-fire-reset dynamics, cutting inference-time memory.
  • Extensive experiments on visual tasks show both memory reduction and performance improvements, establishing LRF-Dyn as a core unit for energy-efficient Spiking Transformers.
  • The findings have practical implications for edge vision deployments and downstream workflows in ML engineering and product planning.

Abstract

Integrating Spiking Neural Networks (SNNs) with Transformer architectures offers a promising pathway to balance energy efficiency and performance, particularly for edge vision applications. However, existing Spiking Transformers face two critical challenges: (i) a substantial performance gap compared to their Artificial Neural Networks (ANNs) counterparts and (ii) high memory overhead during inference. Through theoretical analysis, we attribute both limitations to the Spiking Self-Attention (SSA) mechanism: the lack of locality bias and the need to store large attention matrices. Inspired by the localized receptive fields (LRF) and membrane-potential dynamics of biological visual neurons, we propose LRF-Dyn, which uses spiking neurons with localized receptive fields to compute attention while reducing memory requirements. Specifically, we introduce a LRF method into SSA to assign higher weights to neighboring regions, strengthening local modeling and improving performance. Building on this, we approximate the resulting attention computation via charge-fire-reset dynamics, eliminating explicit attention-matrix storage and reducing inference-time memory. Extensive experiments on visual tasks confirm that our method reduces memory overhead while delivering significant performance improvements. These results establish it as a key unit for achieving energy-efficient Spiking Transformers.