AI Navigate

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • FineRMoE expands fine-grained MoE design to both intermediate and output dimensions to surpass the single-dimension limit on granularity.
  • It introduces a bi-level sparse forward computation scheme and a specialized router to control which experts are activated.
  • The paper proposes a cost-effective upcycling method to build FineRMoE without training from scratch, reducing resource requirements.
  • Experimental results on ten benchmarks show substantial gains, including 6x higher parameter efficiency, 281x lower prefill latency, and 136x higher decoding throughput.
  • The approach signals a path toward more efficient, scalable MoE deployments in real-world systems.

Abstract

As revealed by the scaling law of fine-grained MoE, model performance ceases to be improved once the granularity of the intermediate dimension exceeds the optimal threshold, limiting further gains from single-dimension fine-grained design. To address this bottleneck, we propose FineRMoE (FineR-Grained MoE), an architecture that extends fine-grained expert design to both intermediate and output dimensions, aiming to enhance expert specialization beyond the single-dimension limit. We further introduce a bi-level sparse forward computation paradigm and a specialized routing mechanism to govern the activation. In addition, to obviate the prohibitive cost of training FineRMoE from scratch, we devise a generalized upcycling method to build FineRMoE in a cost-effective manner. Extensive experiments demonstrate the superior performance achieved by FineRMoE across ten standard benchmarks. Compared with the strongest baseline, FineRMoE achieves 6 times higher parameter efficiency, 281 times lower prefill latency, and 136 timese higher decoding throughput during inference.