Self-Routing: Parameter-Free Expert Routing from Hidden States
arXiv cs.AI / 4/2/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces “Self-Routing,” a parameter-free Mixture-of-Experts routing method that turns a designated hidden-state subspace directly into expert logits, removing the need for a learned router projection module.
- Experiments on GPT-2-scale language modeling show Self-Routing performs competitively with a standard learned-router baseline, while eliminating all dedicated routing parameters.
- Self-Routing improves expert utilization balance, achieving about 17% higher average normalized routing entropy and avoiding an explicit load-balancing loss.
- On ImageNet-1K with DeiT-S/16, Self-Routing slightly outperforms the corresponding learned-router MoE, indicating the approach can generalize beyond language models.
- The authors conclude that effective MoE routing can be derived from the model’s hidden representations themselves, challenging the assumption that a dedicated learned router is strictly necessary.
Related Articles

Black Hat Asia
AI Business
v5.5.0
Transformers(HuggingFace)Releases
Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke
Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Inference Engines - A visual deep dive into the layers of an LLM
Dev.to