Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
arXiv cs.AI / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a parameter-free decomposition for Mixture-of-Experts (MoE) models that separates each layer’s representation into a control signal for routing and an orthogonal content channel that the router cannot see.
- Experiments across six MoE architectures show that the content channel retains surface-level properties like language, token identity, and position, while the control signal captures an abstract function that evolves across layers.
- Because routing decisions are low-bandwidth, the mechanism encourages compositional specialization, making expert paths effectively monosemantic even if individual experts remain polysemantic.
- The study finds that the same token can follow different trajectories depending on its semantic role (e.g., colon as type annotation vs. punctuation vs. time separator), and that clusters are more monosemantic in the control subspace than in the full representation.
- The authors conclude that, for interpretability in MoEs, the more natural unit is the token trajectory (route over layers) rather than the expert itself.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

DEEPX and Hyundai Are Building Generative AI Robots
Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline
Dev.to