Measuring the Representational Alignment of Neural Systems in Superposition

arXiv cs.LG / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that common neural representational alignment metrics (e.g., RSA, CKA, and linear regression) can be systematically deflated when networks encode features in superposition rather than one-to-one per neuron.
It argues the misalignment scores primarily reflect differences in the systems’ superposition (projection/mixing) matrices rather than differences in the underlying latent features.
Under partial feature overlap, the metric bias can even invert expected rankings, making less-shared-feature systems appear more aligned than more-shared-feature systems.
The authors emphasize that superposition does not necessarily lose information, since compressed sensing can still allow recovery of original features when they are sparse.
They conclude that accurate comparison of neural systems in superposition requires extracting and aligning the underlying features instead of comparing raw activation mixtures.

Abstract

Comparing the internal representations of neural networks is a central goal in both neuroscience and machine learning. Standard alignment metrics operate on raw neural activations, implicitly assuming that similar representations produce similar activity patterns. However, neural systems frequently operate in superposition, encoding more features than they have neurons via linear compression. We derive closed-form expressions showing that superposition systematically deflates Representational Similarity Analysis, Centered Kernel Alignment, and linear regression, causing networks with identical feature content to appear dissimilar. The root cause is that these metrics are dependent on cross-similarity between two systems' respective superposition matrices, which under assumption of random projection usually differ significantly, not on the latent features themselves: alignment scores conflate what a system represents with how it represents it. Under partial feature overlap, this confound can invert the expected ordering, making systems sharing fewer features appear more aligned than systems sharing more. Crucially, the apparent misalignment need not reflect a loss of information; compressed sensing guarantees that the original features remain recoverable from the lower-dimensional activity, provided they are sparse. We therefore argue that comparing neural systems in superposition requires extracting and aligning the underlying features rather than comparing the raw neural mixtures.

Black Hat Asia

AI Business

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

Measuring the Representational Alignment of Neural Systems in Superposition

Key Points

Abstract

Related Articles

Black Hat Asia

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer