Weight Tying Biases Token Embeddings Towards the Output Space
arXiv cs.CL / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates how weight tying (sharing parameters between input and output embedding/unembedding matrices) shapes the embedding space in language models, finding it aligns more with the unembedding/output space than with the input embeddings of comparable untied models.
- It argues the tied matrix becomes biased toward output prediction because output gradients dominate early in training, rather than gradients needed to support input representation.
- Using tuned lens analysis, the authors show this bias harms early-layer computations that feed into the residual stream, reducing their effectiveness.
- The study provides causal evidence by showing that scaling input gradients during training reduces the unembedding bias, supporting the gradient-imbalance mechanism.
- The results offer a mechanistic explanation for why weight tying can degrade performance at scale and suggest implications for smaller LLMs where the embedding matrix is a larger fraction of total parameters.
Related Articles

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay
Dev.to
Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment
Reddit r/artificial

Stop Tweaking Prompts: Build a Feedback Loop Instead
Dev.to
Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints
Dev.to

The Prompt Tax: Why Every AI Feature Costs More Than You Think
Dev.to