Understanding and Enforcing Weight Disentanglement in Task Arithmetic

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses why task arithmetic works in practice despite lacking a clear theoretical explanation, focusing on the concept of weight disentanglement.
  • It proposes Task-Feature Specialization (TFS) as the fundamental mechanism, proving TFS is sufficient for weight disentanglement and linking it to an observable geometric outcome.
  • The authors show that TFS implies weight vector orthogonality, and use this tractable property as a surrogate to promote disentanglement during training.
  • They introduce OrthoReg, a regularization method that enforces orthogonal internal weight update structure (ΔW) associated with task vectors during fine-tuning.
  • Extensive experiments indicate OrthoReg significantly improves the performance of multiple task arithmetic methods, with an accompanying public code release.

Abstract

Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of ``weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model (\theta_0) or the task vectors (\tau_t) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates (\Delta W) that constitute \tau_t during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods. Code is available at \href{https://github.com/RL-MIND/OrthoReg}{https://github.com/RL-MIND/OrthoReg}.