Geometric Routing Enables Causal Expert Control in Mixture of Experts

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that while sparse Mixture-of-Experts (MoE) models improve scale efficiency, expert specialization is usually opaque, and it introduces a method to make expert identity causally interpretable.
  • It shows that in a low-dimensional metric space, using cosine-similarity routing and geometric constraints can render rank-1 experts monosemantic, making their roles directly inspectable.
  • Using an “Semantic Dictionary” derived by projecting expert outputs through the unembedding matrix, the authors report that about 15% of experts behave as specialists spanning categories such as temporal, geographic, emotional, financial, and military.
  • The paper provides multiple validation lines—correlations with frequency-to-syntax patterns across layers and causal interventions that strongly shift category probabilities (e.g., +321% for temporal, −23% for geographic).
  • It claims the controllability is achieved with zero inference overhead and that cosine routing uniquely offers “geometric transparency,” compared with other router types like linear routers.

Abstract

Sparse Mixture-of-Experts (MoE) models scale parameters while fixing active computation per token, but the specialization of individual experts remains opaque. In a companion paper we showed that routing topology is quality-neutral: five structurally different configurations converge to statistically equivalent language modeling quality. Here we show that expert identity is nonetheless causally meaningful: individual rank-1 experts are monosemantic by construction, and cosine-similarity routing in a low-dimensional metric space makes their specialization directly inspectable. We present four lines of evidence. First, projecting expert output vectors through the unembedding matrix yields a Semantic Dictionary: 15% of experts are monosemantic specialists spanning 10 categories (temporal, geographic, cardinal, discourse, emotional, financial, military, scientific). Second, routing exhibits a frequency-to-syntax gradient: early layers separate tokens by word frequency, deeper layers by syntactic class (Zipf-confound controls, all p < 0.001). Third, causal interventions confirm these labels: steering toward a temporal expert's centroid increases P(temporal) by +321% (median across 44 prompts); suppressing a geographic expert drops P(geographic) by -23%; rewriting an expert's output vector halves target-category probability, and effects compose additively across layers. Fourth, the interventions are not unique to cosine routing: linear routers support comparable steering, but only cosine routing provides geometric transparency -- expert specialization is readable directly from the centroid matrix. MoE expert-level specialization is a first-class interpretability primitive: architecturally monosemantic, causally validated, and controllable at inference with zero overhead.