Expert Selections In MoE Models Reveal (Almost) As Much As Text

arXiv cs.CL / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A text-reconstruction attack on mixture-of-experts (MoE) language models shows that tokens can be recovered using only expert routing selections.
Contrasting with prior logistic-regression approaches, a 3-layer MLP improves top-1 reconstruction to 63.1%, and a transformer-based sequence decoder achieves 91.2% top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100 million tokens.
The results connect MoE routing information to the broader embedding-inversion literature and highlight practical leakage scenarios such as distributed inference and side channels.
Adding noise reduces reconstruction but does not eliminate it, underscoring the need to treat expert selections as sensitive as the underlying text.

Abstract

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.