Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization

arXiv cs.LG / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how “constrained” model steering for foundation models (e.g., safety, privacy, task requirements) is hard to optimize because gradients from the main objective and constraint objectives can interfere with each other.
  • It explains—using a model-merging/spectral view—why spectral cross-task interference occurs and argues it can be addressed by a one-shot orthogonalization of the merged subspace.
  • The authors connect this orthogonalization approach to gradient orthogonalization in the spectral optimizer Muon, forming the basis for their training method.
  • They introduce SIFT (spectral interference-free training), which uses a localization/intervention scheme during optimization to produce controllable updates that reduce objective–constraint conflicts.
  • Experiments on four applications—machine unlearning, safety alignment, text-to-speech adaptation, and hallucination mitigation—show SIFT outperforms control-based and control-free baselines consistently, with code released on GitHub.

Abstract

Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.