Where to Bind Matters: Hebbian Fast Weights in Vision Transformers for Few-Shot Character Recognition

arXiv cs.CV / 5/6/2026

💬 OpinionModels & Research

共有:

Key Points

The paper studies how adding Hebbian Fast-Weight (HFW) modules to vision transformers enables rapid, episode-level adaptation that standard slow-weight transformers lack during inference.
Experiments integrate HFW into multiple transformer backbones (ViT-Small, DeiT-Small, and Swin-Tiny) and evaluate six variants on Omniglot 5-way 1-shot and 5-way 5-shot tasks under a prototypical network meta-learning setup.
For Swin-Tiny, the authors find that applying a single HFW module to the final stage feature map (after hierarchical stages complete) avoids training instability seen when placing Hebbian modules at multiple stages.
This placement achieves the best accuracy across all evaluated models, reaching 96.2% (1-shot) and 99.2% (5-shot), with a reported +0.3 percentage point improvement over the non-Hebbian baseline at 1-shot.
The study analyzes why the shifted-window inductive bias in Swin interacts effectively with Hebbian binding, while per-block HFW placement fails for ViT/DeiT in low-data regimes, and connects findings to fast/slow-weight meta-learning literature.

Abstract

Standard transformer architectures learn fixed slow-weight representations during training and lack mechanisms for rapid adaptation within an episode. In contrast, biological neural systems address this through fast synaptic updates that form transient associative memories during inference, a property known as Hebbian plasticity. In this paper, we conduct an empirical study of Hebbian Fast-Weight (HFW) modules integrated into multiple transformer backbones, including ViT-Small, DeiT-Small, and Swin-Tiny. We evaluate six model variants: ViT, DeiT, Swin, ViT-Hebbian, DeiT-Hebbian, and Swin-Hebbian on 5-way 1-shot and 5-way 5-shot classification tasks using the Omniglot benchmark under a Prototypical Network meta-learning framework. We propose a single module placement strategy for Swin-Tiny in which one HFW module is applied to the final stage feature map after all hierarchical stages have completed. This design avoids the training instability caused by placing separate Hebbian modules at each stage and achieves the highest test accuracy across all six models (96.2\% at 1-shot; 99.2\% at 5-shot), outperforming its non-Hebbian baseline by

+0.3

percentage points at 1-shot. We analyze the interaction between Swin's shifted window inductive bias and episode-level Hebbian binding, discuss why per-block placement fails for ViT and DeiT variants in a low-data regime, and situate the results within the wider literature on fast and slow-weight meta-learning.

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

Solidity LM surpasses Opus

Reddit r/LocalLLaMA

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

Reddit r/LocalLLaMA

We measured the real cost of running a GPT-5.4 chatbot on live websites

Reddit r/artificial

AI ecosystems in China and US grow apart amid tech war

SCMP Tech

Where to Bind Matters: Hebbian Fast Weights in Vision Transformers for Few-Shot Character Recognition

Key Points

Abstract

Related Articles

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Solidity LM surpasses Opus

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

We measured the real cost of running a GPT-5.4 chatbot on live websites

AI ecosystems in China and US grow apart amid tech war

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer