Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

arXiv stat.ML / 4/2/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies low-rank matrix recovery from few linear measurements, focusing on gradient-descent-based non-convex methods that often have sub-optimal sample complexity and slow convergence for ill-conditioned targets.
  • It revisits scaled gradient descent (ScaledGD), showing through refined analysis that it can achieve both optimal sample complexity O((n1+n2)r) and improved iteration complexity O(log(1/epsilon)).
  • The results improve on prior tradeoffs where scaled GD had fast iterations but still suffered from sub-optimal sampling, while standard GD in PSD settings had optimal sampling but iteration complexity scaling poorly with the condition number.
  • The theoretical guarantees extend beyond the PSD special case to general low-rank matrix recovery, supported by numerical experiments demonstrating accelerated convergence for ill-conditioned matrices.

Abstract

The low-rank matrix recovery problem seeks to reconstruct an unknown n_1 \times n_2 rank-r matrix from m linear measurements, where m\ll n_1n_2. This problem has been extensively studied over the past few decades, leading to a variety of algorithms with solid theoretical guarantees. Among these, gradient descent based non-convex methods have become particularly popular due to their computational efficiency. However, these methods typically suffer from two key limitations: a sub-optimal sample complexity of O((n_1 + n_2)r^2) and an iteration complexity of O(\kappa \log(1/\epsilon)) to achieve \epsilon-accuracy, resulting in slow convergence when the target matrix is ill-conditioned. Here, \kappa denotes the condition number of the unknown matrix. Recent studies show that a preconditioned variant of GD, known as scaled gradient descent (ScaledGD), can significantly reduce the iteration complexity to O(\log(1/\epsilon)). Nonetheless, its sample complexity remains sub-optimal at O((n_1 + n_2)r^2). In contrast, a delicate virtual sequence technique demonstrates that the standard GD in the positive semidefinite (PSD) setting achieves the optimal sample complexity O((n_1 + n_2)r), but converges more slowly with an iteration complexity O(\kappa^2 \log(1/\epsilon)). In this paper, through a more refined analysis, we show that ScaledGD achieves both the optimal sample complexity O((n_1 + n_2)r) and the improved iteration complexity O(\log(1/\epsilon)). Notably, our results extend beyond the PSD setting to general low-rank matrix recovery problem. Numerical experiments further validate that ScaledGD accelerates convergence for ill-conditioned matrices with the optimal sampling complexity.