LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes LaplacianFormer, a Transformer variant that replaces softmax attention with a Laplacian kernel to better scale to high-resolution vision workloads that suffer from softmax’s quadratic complexity.
  • It argues that prior linear-attention methods using Gaussian-kernel approximations lack solid theoretical justification and can suppress mid-range token interactions.
  • To mitigate expressiveness loss from low-rank approximations, the method introduces a provably injective feature map that preserves fine-grained token information.
  • Efficient computation is achieved via a Nyström approximation of the kernel matrix and a Newton–Schulz iteration-based solver, avoiding expensive matrix inversion and SVD.
  • The authors report custom CUDA kernels for forward and backward passes and show on ImageNet that LaplacianFormer improves the performance–efficiency trade-off while enhancing attention expressiveness.

Abstract

The quadratic complexity of softmax attention presents a major obstacle for scaling Transformers to high-resolution vision tasks. Existing linear attention variants often replace the softmax with Gaussian kernels to reduce complexity, but such approximations lack theoretical grounding and tend to oversuppress mid-range token interactions. We propose LaplacianFormer, a Transformer variant that employs a Laplacian kernel as a principled alternative to softmax, motivated by empirical observations and theoretical analysis. To address expressiveness degradation under low-rank approximations, we introduce a provably injective feature map that retains fine-grained token information. For efficient computation, we adopt a Nystr\"om approximation of the kernel matrix and solve the resulting system using Newton--Schulz iteration, avoiding costly matrix inversion and SVD. We further develop custom CUDA implementations for both the kernel and solver, enabling high-throughput forward and backward passes suitable for edge deployment. Experiments on ImageNet show that LaplacianFormer achieves strong performance-efficiency trade-offs while improving attention expressiveness.