Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

arXiv cs.CL / 4/3/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Stochastic Attention (SA), a connectome-inspired technique that random-permutes token order before applying sliding-window attention and then restores the original order afterward.
SA effectively converts a fixed local window into a stochastic global routing mechanism while keeping the same per-layer computational budget of O(nw).
By sampling independent permutations across depth, SA yields exponentially expanding receptive fields, reaching full sequence coverage in O(log_w n) layers instead of O(n/w) for standard sliding-window attention.
Experiments show SA improves pre-training of language models (with gated SA+SWA performing best for average zero-shot accuracy) and boosts training-free inference on Qwen3-8B and Qwen3-30B-A3B, outperforming SWA and matching/exceeding Mixture of Block Attention under similar compute.
The authors argue that stochastic routing inspired by brain connectomics is a practical, drop-in attention primitive that complements existing efficient attention methods (linear/sparse).

Abstract

The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by this observation, we propose Stochastic Attention (SA), a drop-in enhancement for sliding-window attention (SWA) that applies a random permutation to the token sequence before windowed attention and restores the original order afterward. This transforms the fixed local window into a stochastic global one within the same

O(nw)

per-layer budget. Through depth, independently sampled permutations yield exponentially growing receptive fields, achieving full sequence coverage in

O(\log_w n)

layers versus

O(n/w)

for SWA. We validate SA in two settings: pre-training language models from scratch, where a gated SA + SWA combination achieves the best average zero-shot accuracy, and training-free inference on Qwen3-8B and Qwen3-30B-A3B, where SA consistently outperforms SWA and matches or exceeds Mixture of Block Attention at comparable compute budgets. These results suggest that connectome-inspired stochastic routing is a practical primitive for improving the expressivity of efficient attention, complementary to existing linear and sparse approaches.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer