Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

arXiv cs.LG / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper argues that LoRA fine-tuning performance is highly sensitive to which low-rank subspace is selected at initialization, since allocating capacity to task-irrelevant directions can significantly hurt results.
  • It critiques existing initialization methods for relying mainly on pre-trained weight properties (e.g., geometry or magnitude) and instead proposes a data-aware view based on how parameter-space directions affect predictions under the downstream data distribution.
  • The authors introduce a Fisher-guided initialization framework that uses curvature information induced by downstream data to quantify the impact of parameter perturbations and to select more task-aligned LoRA directions.
  • Experiments across multiple tasks and modalities show that data-aware (Fisher-guided) initialization improves downstream performance consistently and significantly compared with prior approaches.

Abstract

LoRA adapts large language models (LLMs) by restricting updates to low-rank subspaces of pre-trained weights. While this substantially reduces training cost, the effectiveness of adaptation critically depends on which subspace is chosen at initialization: a poor initialization that allocates capacity to task-irrelevant directions can severely hinder downstream performance. Existing initialization strategies primarily rely on the intrinsic properties of pre-trained weights, implicitly assuming that weight geometry alone reflects task relevance. However, such criteria overlook how the model interacts with the downstream data distribution. In this work, we formulate LoRA initialization as identifying the degree of impact of directions in parameter space under the target data distribution. We argue that data-aware sensitivity, rather than weight-only magnitude, should govern the choice of adaptation subspaces. Building on this perspective, we propose a Fisher-guided framework that leverages curvature information induced by downstream data to characterize how parameter perturbations influence model predictions. This perspective yields a principled, task-dependent criterion for selecting LoRA directions that better align adaptation with the target objective. Empirical results across diverse tasks and modalities demonstrate that data-aware initialization consistently and significantly improves downstream performance over existing approaches.