Expert Selections In MoE Models Reveal (Almost) As Much As Text
arXiv cs.CL / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- A text-reconstruction attack on mixture-of-experts (MoE) language models shows that tokens can be recovered using only expert routing selections.
- Contrasting with prior logistic-regression approaches, a 3-layer MLP improves top-1 reconstruction to 63.1%, and a transformer-based sequence decoder achieves 91.2% top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100 million tokens.
- The results connect MoE routing information to the broader embedding-inversion literature and highlight practical leakage scenarios such as distributed inference and side channels.
- Adding noise reduces reconstruction but does not eliminate it, underscoring the need to treat expert selections as sensitive as the underlying text.
Related Articles
How We Built ScholarNet AI: An AI-Powered Study Platform for Students
Dev.to
Using Notion MCP: Building a Personal AI 'OS' to Claim Back Your Morning
Dev.to
The LiteLLM Attack Exposed a Bigger Problem: Your Vibe-Coded App Probably Has the Same Vulnerabilities
Dev.to
Why Your Claude-Assisted Project Falls Apart After Week 3 (And How to Fix It)
Dev.to
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
arXiv cs.CL