REAM: Merging Improves Pruning of Experts in LLMs

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本論文は、Mixture-of-Experts（MoE）LLMに対するメモリ削減手法として、ルータに基づく専門家（エキスパート）削除（REAP）ではなく統合（REAM）を行う新手法Router-weighted Expert Activation Mergingを提案している。
REAMは削除ではなくエキスパートをグルーピングして重みをマージすることで、圧縮による性能劣化をより抑えることを狙っている。
複数のMoE LLMで、複数選択式（MC）と生成（GEN）のベンチマークに対してREAPや他のベースラインと比較し、MCとGENの性能トレードオフが観測されることを示す。
トレードオフはキャリブレーションデータの「一般・数学・コーディング」比率に依存し、その混合比を調整してPareto frontierを分析した結果、REAMはベースラインを上回り、場合によっては元の非圧縮モデルに近い性能を示すと報告されている。

Abstract

Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches to reduce memory requirements include weight pruning and quantization. Motivated by the Router-weighted Expert Activation Pruning (REAP) that prunes experts, we propose a novel method, Router-weighted Expert Activation Merging (REAM). Instead of removing experts, REAM groups them and merges their weights, better preserving original performance. We evaluate REAM against REAP and other baselines across multiple MoE LLMs on diverse multiple-choice (MC) question answering and generative (GEN) benchmarks. Our results reveal a trade-off between MC and GEN performance that depends on the mix of calibration data. By controlling the mix of general, math and coding data, we examine the Pareto frontier of this trade-off and show that REAM often outperforms the baselines and in many cases is comparable to the original uncompressed models.

Black Hat Asia

AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

The Register

I tested and ranked every ai companion app I tried and here's my honest breakdown

Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

REAM: Merging Improves Pruning of Experts in LLMs

Key Points

Abstract

Related Articles

Black Hat Asia

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

I tested and ranked every ai companion app I tried and here's my honest breakdown

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer