AI Navigate

Has increasing the number of experts used in MoE models ever meaningfully helped?

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The post revisits whether increasing the number of experts in Mixture-of-Experts (MoE) models yields meaningful performance gains, citing past debates around Qwen3-30B-A3B and Qwen3-30b-A6B.
  • It notes that adjusting the MoE expert count is easy to configure in Llama-CPP, but there hasn’t been notable recent experimentation with larger numbers of experts.
  • The author is explicitly asking the community if anyone has conducted new tests or measurements with more MoE experts.
  • The discussion underscores ongoing uncertainty about MoE scaling and the need for empirical comparisons of accuracy, compute, and memory costs when increasing the number of experts.

I remember there was a lot of debate as to whether or not this was worthwhile back when Qwen3-30B-A3B came out. A few people even swore by "Qwen3-30b-A6B" for a short while.

It's still an easy configuration in Llama-CPP, but I don't really see any experimentation with it anymore.

Has anyone been testing around with this much?

submitted by /u/ForsookComparison
[link] [comments]