Model Merging: Foundations and Algorithms

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes model merging as a new paradigm for combining independently trained neural networks directly in weight space, ideally without access to original training data or additional optimization.
For single-task models with shared objectives but different initializations, it introduces C$^2$M$^3$, a cycle-consistent, Frank-Wolfe–based algorithm that aligns models into a shared parameter space so weight averaging becomes meaningful.
For multi-task fine-tuned models, it formalizes “task vectors” as approximate gradients, explaining when task arithmetic works and where it breaks down, and introduces TSV (Task Singular Vectors) and TSV-Merge to leverage low-rank structure for compression and reduced interference.
It further presents MASS for input-adaptive routing that selects task-relevant subspaces at inference using TSV geometry, and MERGE$^3$, an evolutionary framework using Item Response Theory to cut evaluation cost by up to 50× while keeping solution quality.
Overall, the work provides both theoretical and algorithmic foundations intended to enable composing and reusing learned capabilities across multiple models.

Abstract

Modern deep learning usually treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies model merging as an alternative paradigm: combining independently trained neural networks directly in weight space, with little or no optimization and without requiring access to the original training data. The thesis considers two main regimes. In the single-task setting, where models share an objective but differ in initialization, we introduce C

^2

^3

, a cycle-consistent merging algorithm based on Frank-Wolfe optimization. C

^2

^3

aligns multiple networks into a shared, reference-free parameter space, making weight averaging meaningful without privileging any individual model. In the multi-task setting, where models are fine-tuned for different downstream tasks from a common pretrained initialization, we first develop a theoretical account of task vectors as approximate gradients. This explains both the effectiveness and the limitations of task arithmetic. Building on this view, we show that task vectors inherit the low-rank structure of gradients and introduce Task Singular Vectors (TSV), a decomposition that enables compression and interference reduction through TSV-Merge. We then present MASS, an input-adaptive routing method that uses TSV geometry to select task-relevant subspaces at inference time. Finally, we introduce MERGE

^3

, an evolutionary merging framework that uses Item Response Theory to reduce evaluation costs by up to 50

\times

while preserving solution quality. Together, these contributions provide theoretical and algorithmic foundations for model merging, supporting a paradigm in which learned capabilities can be composed, reused, and extended across models.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

MCP annotations are a UX layer, not a security layer

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Model Merging: Foundations and Algorithms

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

MCP annotations are a UX layer, not a security layer

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer