LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

arXiv cs.LG / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • LiMEは、MoE-PEFTの課題である「専門家(expert)ごとのアダプタ複製による学習可能パラメータの線形増加」を、共有PEFTモジュールの軽量な出力モジュレーションで解決する手法として提案されています。
  • LiMEはルータ用の学習パラメータを不要にする「ゼロパラメータ・ルーティング」を導入し、既存の凍結表現や適応済み表現を活用して層ごとのrouterパラメータ学習を回避します。
  • 理論的には、専門家数を増やすほどタスク関連情報を保持しやすくなることと、モジュレーションがexpert固有のPEFTを有界誤差で近似できることを示しています。
  • MMT-47(テキスト/画像/動画の47タスク)で、LiMEは既存MoE-PEFT比で最大4倍少ない学習可能パラメータと最大29%高速な学習を達成しつつ、競争的または優越する性能を報告しています。

Abstract

MoE-PEFT methods combine Mixture of Experts with parameter-efficient fine-tuning for multi-task adaptation, but require separate adapters per expert causing trainable parameters to scale linearly with expert count and limiting applicability to adapter-based architectures. We propose LiME (Lightweight Mixture of Experts), which achieves expert specialization through lightweight modulation rather than adapter replication. Instead of separate adapters, LiME uses a single shared PEFT module and modulates its output with lightweight expert vectors, reducing expert parameters while generalizing to any PEFT method. Notably, LiME introduces zero-parameter routing by leveraging existing frozen and adapted representations eliminating learned router parameters typically required per layer. Theoretically, we prove that (i) more experts preserve more task-relevant information and (ii) modulation approximates full expert-specific PEFT with bounded error. LiME further incorporates n-gram windowed routing and adaptive expert selection (Auto Top-K) based on routing confidence. Experiments on MMT-47, a multimodal multi-task benchmark with 47 tasks spanning text, image, and video, demonstrate that LiME achieves competitive or superior performance while using up to 4x fewer trainable parameters and up to 29% faster training compared to corresponding MoE-PEFT baselines.