Towards Adaptive Continual Model Merging via Manifold-Aware Expert Evolution

arXiv cs.LG / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses limitations of Continual Model Merging (CMM), highlighting a saturation–redundancy dilemma in backbone-centric methods and redundancy/routing bottlenecks in MoE variants.
It proposes MADE-IT, an adaptive CMM approach that uses manifold-aware expert evolution to manage expert representation diversity while keeping the architecture compact.
MADE-IT introduces a projection-based subspace affinity metric and a distribution-aware adaptive threshold to decide when and how experts should evolve autonomously.
It also avoids parameterized gating networks by using a data-free, training-free implicit routing method that activates experts through feature–subspace alignment.
Experiments reportedly show MADE-IT improves accuracy and robustness over long-horizon and shuffled task sequences, while pruning redundant experts, especially in generic modules and early layers.

Abstract

Continual Model Merging (CMM) sequentially integrates task-specific models into a unified architecture without intensive retraining. However, existing CMM methods are hindered by a fundamental saturation-redundancy dilemma: backbone-centric approaches face parameter saturation and representation interference within fixed capacities, whereas Mixture-of-Experts (MoE) variants resort to indiscriminate expansion, incurring expert redundancy and a routing bottleneck reliant on additional data-driven optimization. To resolve these challenges, we propose MADE-IT (Manifold-Aware Dynamic Expert Evolution and Implicit rouTing), an adaptive CMM method that orchestrates expert management and activation by grounding intrinsic expert representations in manifold geometry. We introduce a projection-based subspace affinity metric coupled with a distribution-aware adaptive threshold mechanism to guide autonomous expert evolution, harmonizing diversity with architectural parsimony. Furthermore, to bypass parameterized gating networks, we design a data-free and training-free implicit routing mechanism that activates experts via feature-subspace alignment. Extensive experiments demonstrate that MADE-IT consistently outperforms strong baselines in accuracy and robustness across long-horizon and shuffled task sequences, while significantly pruning redundant experts, particularly within generic modules and early layers.