Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

arXiv cs.LG / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that LoRA fine-tuning performance is highly sensitive to which low-rank subspace is selected at initialization, since allocating capacity to task-irrelevant directions can significantly hurt results.
It critiques existing initialization methods for relying mainly on pre-trained weight properties (e.g., geometry or magnitude) and instead proposes a data-aware view based on how parameter-space directions affect predictions under the downstream data distribution.
The authors introduce a Fisher-guided initialization framework that uses curvature information induced by downstream data to quantify the impact of parameter perturbations and to select more task-aligned LoRA directions.
Experiments across multiple tasks and modalities show that data-aware (Fisher-guided) initialization improves downstream performance consistently and significantly compared with prior approaches.

Abstract

LoRA adapts large language models (LLMs) by restricting updates to low-rank subspaces of pre-trained weights. While this substantially reduces training cost, the effectiveness of adaptation critically depends on which subspace is chosen at initialization: a poor initialization that allocates capacity to task-irrelevant directions can severely hinder downstream performance. Existing initialization strategies primarily rely on the intrinsic properties of pre-trained weights, implicitly assuming that weight geometry alone reflects task relevance. However, such criteria overlook how the model interacts with the downstream data distribution. In this work, we formulate LoRA initialization as identifying the degree of impact of directions in parameter space under the target data distribution. We argue that data-aware sensitivity, rather than weight-only magnitude, should govern the choice of adaptation subspaces. Building on this perspective, we propose a Fisher-guided framework that leverages curvature information induced by downstream data to characterize how parameter perturbations influence model predictions. This perspective yields a principled, task-dependent criterion for selecting LoRA directions that better align adaptation with the target objective. Empirical results across diverse tasks and modalities demonstrate that data-aware initialization consistently and significantly improves downstream performance over existing approaches.

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Last Week in AI

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Reddit r/LocalLLaMA

Uber Shares What Happens When 1.500 AI Agents Hit Production

Reddit r/artificial

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

Reddit r/LocalLLaMA

Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Uber Shares What Happens When 1.500 AI Agents Hit Production

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer