Generalizing the Geometry of Model Merging Through Frechet Averages

arXiv cs.LG / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that effective model merging without extra training requires symmetry-aware methods, because naive parameter-space averaging can break under architectural symmetries.
It proposes a general framework: perform merging as Fréchet averaging by choosing parameters that minimize a sum of geodesic distances on a suitably chosen manifold.
The authors emphasize that the critical design choice is the overall geometry—specifically the metric, manifold, and distance approximation—which defines what it means for two models to be “close.”
Under simplifying assumptions, the paper shows that Fréchet averaging can subsume and generalize Fisher merging.
For low-rank adapters (LoRA), the paper identifies a distinct quotient-manifold geometry, reviews limitations of existing LoRA merging methods, and introduces a practical algorithm with comparisons to other approaches.

Abstract

Model merging aims to combine multiple models into one without additional training. Na\"ive parameter-space averaging can be fragile under architectural symmetries, as their geometry does not take them into account. In this work we show that not only the geometry, but also the averaging procedure itself, must be symmetry-invariant to achieve symmetry-aware merges. Consequently, we propose a general solution: merging as Fr\'echet averaging, i.e., selecting parameters that minimize a sum of geodesic distances on an appropriate manifold. In this view, the key design choice is the overall geometry, i.e., the choice of metric, manifold, and distance approximation, that determines what it means for two models to be "close". We show that Fr\'echet averaging, combined with simplifying assumptions, contains Fisher merging. Building on this, we examine the particular case of low-rank adapters (LoRA), whose symmetries induce a distinct geometry: that of a quotient manifold. We outline the limitations of current LoRA merging methods, propose a practical algorithm for this setting, and show how they compare with other commonly used approaches.