TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

arXiv cs.LG / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • TalkLoRA introduces a communication-aware MoE-based LoRA framework that adds a lightweight “Talking Module” to let low-rank LoRA experts exchange controlled information before routing.
  • The approach targets instability in existing MoE-LoRA methods caused by assuming experts are independent, aiming to reduce expert dominance and improve routing balance.
  • The paper provides theoretical results that expert communication smooths routing dynamics by mitigating perturbation amplification and strictly generalizes prior MoELoRA architectures.
  • Experiments on language understanding and generation tasks show consistent improvements over vanilla LoRA and MoELoRA while maintaining higher parameter efficiency under comparable budgets.
  • Code is released publicly, enabling researchers and practitioners to reproduce and build on the method for more stable parameter-efficient adaptation with MoE routing.

Abstract

Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often leading to unstable routing, expert dominance. In this paper, we propose \textbf{TalkLoRA}, a communication-aware MoELoRA framework that relaxes this independence assumption by introducing expert-level communication prior to routing. TalkLoRA equips low-rank experts with a lightweight Talking Module that enables controlled information exchange across expert subspaces, producing a more robust global signal for routing. Theoretically, we show that expert communication smooths routing dynamics by mitigating perturbation amplification while strictly generalizing existing MoELoRA architectures. Empirically, TalkLoRA consistently outperforms vanilla LoRA and MoELoRA across diverse language understanding and generation tasks, achieving higher parameter efficiency and more balanced expert routing under comparable parameter budgets. These results highlight structured expert communication as a principled and effective enhancement for MoE-based parameter-efficient adaptation. Code is available at https://github.com/why0129/TalkLoRA.