Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics

arXiv cs.CL / 4/3/2026

📰 News

Key Points

  • The paper argues that apparent “superposition” neuron overlap can be inflated by lexical confounds when the same surface word form (e.g., “bank”) activates different meanings rather than truly compressing unrelated concepts.
  • Using a 2x2 factorial decomposition, it finds the lexical-only overlap signal consistently exceeds the semantic-only overlap signal across models from 110M to 70B parameters.
  • The lexical confound also appears in sparse autoencoders, where 18–36% of features blend senses, and it accounts for a small but nontrivial fraction of activation dimensions (≤1%).
  • Removing the lexical component improves word sense disambiguation and makes knowledge edits more selective, with reported statistical evidence (p = 0.002).
  • The results suggest that superposition metrics should explicitly account for lexical identity effects to avoid misattributing overlap to mechanistic compression.
  • categories: [

Abstract

If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense disambiguation and makes knowledge edits more selective (p = 0.002).

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics | AI Navigate