Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
arXiv cs.CL / 4/3/2026
📰 News
Key Points
- The paper argues that apparent “superposition” neuron overlap can be inflated by lexical confounds when the same surface word form (e.g., “bank”) activates different meanings rather than truly compressing unrelated concepts.
- Using a 2x2 factorial decomposition, it finds the lexical-only overlap signal consistently exceeds the semantic-only overlap signal across models from 110M to 70B parameters.
- The lexical confound also appears in sparse autoencoders, where 18–36% of features blend senses, and it accounts for a small but nontrivial fraction of activation dimensions (≤1%).
- Removing the lexical component improves word sense disambiguation and makes knowledge edits more selective, with reported statistical evidence (p = 0.002).
- The results suggest that superposition metrics should explicitly account for lexical identity effects to avoid misattributing overlap to mechanistic compression.
- categories: [