Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs

arXiv stat.ML / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a communication-theoretic perspective on Mixture-of-Experts (MoE) gating by treating the gate as a stochastic channel constrained by a finite information rate.
  • It derives an information-theoretic generalization bound specialized via mutual information and develops a rate–distortion characterization D(R_g) for finite-rate gating, where R_g = I(X;T).
  • Under an empirical rate–distortion optimality assumption, the authors relate expected generalization error to the distortion term D(R_g) plus additional complexity and sample-size terms.
  • The results provide capacity-aware limits for communication-constrained MoE systems, explicitly quantifying trade-offs among gating rate, model expressivity, and generalization performance.
  • Synthetic experiments with multi-expert models empirically validate the predicted relationships between gating rate and generalization.

Abstract

Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, {we specialize a mutual-information generalization bound and develop a rate-distortion characterization D(R_g) of finite-rate gating, where R_g:=I(X; T), yielding (under a standard empirical rate-distortion optimality condition) \mathbb{E}[R(W)] \le D(R_g)+\delta_m+\sqrt{(2/m)\, I(S; W)}. }The analysis yields capacity-aware limits for communication-constrained MoE systems, and numerical simulations on synthetic multi-expert models empirically confirm the predicted trade-offs between gating rate, expressivity, and generalization.
広告

Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs | AI Navigate