Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Mixture-of-experts（MoE）LLMの提供では通信がランタイムの大きな割合を占めるため、これまで高帯域の高価なスケールアップ・ネットワークへの投資が進んできたが、その必要性を見直す議論が提示された。
本稿は、スケールアップ、スケールアウト、3Dトーラス、3Dフルメッシュの4つのXPUトポロジについて、ネットワークの費用対効果をクロスレイヤで体系的に比較し、スイッチレス（スイッチ無し）系トポロジがスケールアップより全シナリオで有利だと示した。
スイッチレス・トポロジはコスト効率を20.6〜56.2%改善し、特に3Dフルメッシュは性能とコストのトレードオフでパレート最適であると結論づけた。
さらに、スケールアップ側のリンク帯域が過剰に見積もられている可能性があり、帯域を下げることでスループット/コストが最大27%向上し得ること、そして次世代GPUでもスイッチレスのコスト優位が継続しそうだと予測した。

Abstract

Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are over-provisioned: reducing the link bandwidth improves throughput per cost by up to 27%. A forward-looking analysis of upcoming GPU generations indicates that the cost-performance advantage of switchless networks will likely persist.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B

Reddit r/LocalLLaMA

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Roundtable chat with Talkie-1930 and Gemma 4 31B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer