Multi-Frequency Local Plasticity for Visual Representation Learning

arXiv cs.CV / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes “Multi-Frequency Local Plasticity,” a modular visual representation learning framework that largely avoids end-to-end backprop by using fixed multi-frequency Gabor decomposition and local (Hebbian/Oja) plasticity rules.
It combines within-stream competitive learning (with anti-Hebbian decorrelation) plus an associative memory module inspired by modern Hopfield retrieval, together with iterative top-down modulation driven by local prediction/reconstruction signals.
Only a small set of parameters—specifically the final linear readout and top-down projection matrices—are trained with gradient descent, while most representational layers rely on local learning updates.
On CIFAR-10, the full model achieves 80.1% ± 0.3% top-1 accuracy with a linear probe, outperforming a Hebbian-only baseline (71.0%) but trailing the gradient-trained reference on the same fixed Gabor basis (83.4%).
Factorial analysis suggests each component (multi-frequency streams, associative memory, top-down feedback) contributes largely additively, with a statistically significant interaction between streams and top-down modulation (p=0.02), though experiments are limited to CIFAR-10/100.

Abstract

We study how far structured architectural bias can compensate for the absence of end-to-end gradient-based representation learning in visual recognition. Building on the VisNet tradition, we introduce a modular hierarchical framework combining: (i) fixed multi-frequency Gabor decomposition into F=7 parallel streams; (ii) within-stream competitive learning with Hebbian and Oja updates and anti-Hebbian decorrelation; (iii) an associative memory module inspired by modern Hopfield retrieval; and (iv) iterative top-down modulation using local prediction and reconstruction signals. Representational layers are trained without end-to-end backpropagation through the full hierarchy; only the final linear readout and top-down projection matrices are optimized by gradient descent. We therefore interpret the model as a hybrid system that is predominantly locally trained but includes a small number of gradient-trained parameters. On CIFAR-10, the full model reaches 80.1% +/- 0.3% top-1 accuracy, linear probe), compared with 71.0% for a Hebbian-only baseline and 83.4% for a gradient-trained model on the same fixed Gabor basis. On CIFAR-100, performance is 54.8%. Factorial analysis indicates that multi-frequency streams, associative memory, and top-down feedback contribute largely additively, with a significant Streams x TopDown interaction (p=0.02). These results suggest that carefully chosen architectural priors can recover a substantial fraction of the performance typically associated with global gradient training, while leaving a measurable residual gap. Experiments are limited to CIFAR-10/100.