Expert Selections In MoE Models Reveal (Almost) As Much As Text

arXiv cs.CL / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A text-reconstruction attack on mixture-of-experts (MoE) language models shows that tokens can be recovered using only expert routing selections.
Contrasting with prior logistic-regression approaches, a 3-layer MLP improves top-1 reconstruction to 63.1%, and a transformer-based sequence decoder achieves 91.2% top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100 million tokens.
The results connect MoE routing information to the broader embedding-inversion literature and highlight practical leakage scenarios such as distributed inference and side channels.
Adding noise reduces reconstruction but does not eliminate it, underscoring the need to treat expert selections as sensitive as the underlying text.

Abstract

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Dev.to

Using Notion MCP: Building a Personal AI 'OS' to Claim Back Your Morning

Dev.to

The LiteLLM Attack Exposed a Bigger Problem: Your Vibe-Coded App Probably Has the Same Vulnerabilities

Dev.to

Why Your Claude-Assisted Project Falls Apart After Week 3 (And How to Fix It)

Dev.to

LatentQA: Teaching LLMs to Decode Activations Into Natural Language

arXiv cs.CL

Expert Selections In MoE Models Reveal (Almost) As Much As Text

Key Points

Abstract

Related Articles

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Using Notion MCP: Building a Personal AI 'OS' to Claim Back Your Morning

The LiteLLM Attack Exposed a Bigger Problem: Your Vibe-Coded App Probably Has the Same Vulnerabilities

Why Your Claude-Assisted Project Falls Apart After Week 3 (And How to Fix It)

LatentQA: Teaching LLMs to Decode Activations Into Natural Language

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer