Untrained CNNs Match Backpropagation at V1: A Systematic RSA Comparison of Four Learning Rules Against Human fMRI

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study systematically compares four learning rules—backpropagation, feedback alignment, predictive coding, and STDP—using identical CNN architectures and evaluates alignment to human visual cortex via representational similarity analysis (RSA) on THINGS-fMRI data.
  • A key result is that early visual cortex alignment (V1/V2) is largely determined by the network architecture rather than the learning rule, with an untrained random-weight CNN performing similarly to backpropagation.
  • At higher visual areas (LOC/IT), differences emerge: backpropagation yields superior alignment at LOC/IT, while predictive coding with local Hebbian updates matches backpropagation at IT.
  • Feedback alignment underperforms, producing representations below the random baseline in V1, and the findings remain robust after controlling for pixel-level similarity.
  • Overall, the authors conclude that “learning rule vs cortical alignment” is region-specific: architecture drives early alignment, whereas supervised objectives are more important for late (higher-area) alignment.

Abstract

A central question in computational neuroscience is whether the learning rule used to train a neural network determines how well its internal representations align with those of the human visual cortex. We present a systematic comparison of four learning rules -- backpropagation (BP), feedback alignment (FA), predictive coding (PC), and spike-timing-dependent plasticity (STDP) -- applied to identical convolutional architectures and evaluated against human fMRI data from the THINGS-fMRI dataset (720 stimuli, 3 subjects) using Representational Similarity Analysis (RSA). Crucially, we include an untrained random-weights baseline that reveals the dominant role of architecture. We find that early visual alignment (V1/V2) is primarily architecture-driven: an untrained CNN achieves rho = 0.071, statistically indistinguishable from BP (rho = 0.072, p = 0.43). Learning rules only differentiate at higher visual areas: BP dominates at LOC/IT, and PC with local Hebbian updates achieves IT alignment statistically indistinguishable from BP (p = 0.18). FA consistently impairs representations below the random baseline at V1. Partial RSA confirms all effects survive pixel-similarity control. These results demonstrate that the relationship between learning rules and cortical alignment is region-specific: architecture determines early alignment, while supervised objectives drive late alignment.