Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

arXiv cs.CL / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tests whether transformer language models exhibit “scalar variability,” where representational noise scales proportionally with magnitude to yield a constant coefficient of variation seen in biological magnitude systems.
Across 26 numerical magnitudes in three 7–8B models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, and Llama-3-8B-Base), the authors find an anti-scalar pattern: representational variability decreases as magnitude increases (scaling exponent alpha ≈ -0.19).
The negative scaling persists under multiple checks, including full-dimensional space analysis (alpha ≈ -0.04) and sentence-identity correction (alpha ≈ -0.007), with no primary layers showing alpha > 0 among most layers (0/16).
The anti-scalar effect is reported to be 3–5× stronger along the magnitude axis than in orthogonal dimensions, and corpus frequency substantially predicts per-magnitude variability (rho = 0.84).
The authors conclude that standard distributional learning in transformers reproduces some log-compressive magnitude geometry but does not produce the biological constant-CV noise signature.

Abstract

Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude axis than orthogonal dimensions, and corpus frequency strongly predicted per-magnitude variability (rho = .84). These results demonstrate that distributional learning alone is insufficient to produce scalar variability: transformers reproduce log-compressive magnitude geometry but not the constant-CV noise signature observed in biological systems.

Black Hat Asia

AI Business

OpenAI vs Anthropic IPO Finances Compared — The 2026 AI Mega IPO Race

Dev.to

Prompt Engineering in 2026: Advanced Techniques for Better AI Results

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Ace Step 1.5 XL Models Available

Reddit r/LocalLLaMA

Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

Key Points

Abstract

Related Articles

Black Hat Asia

OpenAI vs Anthropic IPO Finances Compared — The 2026 AI Mega IPO Race

Prompt Engineering in 2026: Advanced Techniques for Better AI Results

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Ace Step 1.5 XL Models Available

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer