Geometric Routing Enables Causal Expert Control in Mixture of Experts

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that while sparse Mixture-of-Experts (MoE) models improve scale efficiency, expert specialization is usually opaque, and it introduces a method to make expert identity causally interpretable.
It shows that in a low-dimensional metric space, using cosine-similarity routing and geometric constraints can render rank-1 experts monosemantic, making their roles directly inspectable.
Using an “Semantic Dictionary” derived by projecting expert outputs through the unembedding matrix, the authors report that about 15% of experts behave as specialists spanning categories such as temporal, geographic, emotional, financial, and military.
The paper provides multiple validation lines—correlations with frequency-to-syntax patterns across layers and causal interventions that strongly shift category probabilities (e.g., +321% for temporal, −23% for geographic).
It claims the controllability is achieved with zero inference overhead and that cosine routing uniquely offers “geometric transparency,” compared with other router types like linear routers.

Abstract

Sparse Mixture-of-Experts (MoE) models scale parameters while fixing active computation per token, but the specialization of individual experts remains opaque. In a companion paper we showed that routing topology is quality-neutral: five structurally different configurations converge to statistically equivalent language modeling quality. Here we show that expert identity is nonetheless causally meaningful: individual rank-1 experts are monosemantic by construction, and cosine-similarity routing in a low-dimensional metric space makes their specialization directly inspectable. We present four lines of evidence. First, projecting expert output vectors through the unembedding matrix yields a Semantic Dictionary: 15% of experts are monosemantic specialists spanning 10 categories (temporal, geographic, cardinal, discourse, emotional, financial, military, scientific). Second, routing exhibits a frequency-to-syntax gradient: early layers separate tokens by word frequency, deeper layers by syntactic class (Zipf-confound controls, all

p < 0.001

). Third, causal interventions confirm these labels: steering toward a temporal expert's centroid increases P(temporal) by +321% (median across 44 prompts); suppressing a geographic expert drops P(geographic) by -23%; rewriting an expert's output vector halves target-category probability, and effects compose additively across layers. Fourth, the interventions are not unique to cosine routing: linear routers support comparable steering, but only cosine routing provides geometric transparency -- expert specialization is readable directly from the centroid matrix. MoE expert-level specialization is a first-class interpretability primitive: architecturally monosemantic, causally validated, and controllable at inference with zero overhead.