Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models

arXiv cs.LG / 5/5/2026

📰 NewsModels & Research

Key Points

  • Low-rank decomposition can compress large language models, but performance depends critically on choosing which singular-vector bases to keep for a given task.
  • The article argues that prior heuristics (e.g., pruning based on small re-learned magnitudes or adapting coefficients) can be misaligned with task loss because they ignore the local geometry of the loss landscape.
  • It introduces Basis Selection with Importance (BSI), which ranks and prunes bases by estimating the expected increase in task loss when each basis is removed, using a second-order Taylor expansion that blends sensitivity and curvature.
  • To apply this efficiently to LLMs, BSI uses an adapted Hutchinson-style randomized probing method to estimate Hessian-diagonal information via symmetric parameter perturbations.
  • Experiments on mathematical reasoning benchmarks show BSI outperforming existing low-rank decomposition baselines, with the biggest gains under deep compression settings, supported by theoretical bounds and sample-complexity guarantees.

Abstract

Low-rank decomposition is a compelling approach for compressing large language models, but its effectiveness hinges on selecting which singular-vector bases to retain for a target task. Existing methods such as Basel adapt singular-value coefficients on downstream data and prune bases with small re-learned magnitudes, a heuristic that can be misaligned with task performance because it ignores the local geometry of the loss landscape. We present Basis Selection with Importance (BSI), a principled low-rank compression framework that ranks and prunes bases by directly estimating the expected loss increase incurred when each basis is removed. BSI derives a derivative-based importance score from a second-order Taylor expansion of the task loss with respect to singular values, combining first-order sensitivity and second-order curvature to quantify pruning impact. To make this criterion practical for LLMs, we develop an efficient Hessian-diagonal estimator by adapting the Hutchinson randomized-probing method to loss curvature with symmetric parameter perturbations. We provide a comprehensive theoretical analysis, including loss-increase bounds under basis pruning, explicit propagation of Hessian-diagonal estimation error into these bounds, variance characterization tied to the Hessian spectrum, high-probability sample-complexity guarantees for achieving a target estimation accuracy, and guidance on perturbation intensity. Extensive experiments on mathematical reasoning benchmarks demonstrate that BSI consistently outperforms state-of-the-art low-rank decomposition baselines, with especially strong improvements under deep compression.