Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

arXiv stat.ML / 4/2/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies low-rank matrix recovery from few linear measurements, focusing on gradient-descent-based non-convex methods that often have sub-optimal sample complexity and slow convergence for ill-conditioned targets.
It revisits scaled gradient descent (ScaledGD), showing through refined analysis that it can achieve both optimal sample complexity O((n1+n2)r) and improved iteration complexity O(log(1/epsilon)).
The results improve on prior tradeoffs where scaled GD had fast iterations but still suffered from sub-optimal sampling, while standard GD in PSD settings had optimal sampling but iteration complexity scaling poorly with the condition number.
The theoretical guarantees extend beyond the PSD special case to general low-rank matrix recovery, supported by numerical experiments demonstrating accelerated convergence for ill-conditioned matrices.

Abstract

The low-rank matrix recovery problem seeks to reconstruct an unknown

n_1 \times n_2

rank-

r

matrix from

m

linear measurements, where

m\ll n_1n_2

. This problem has been extensively studied over the past few decades, leading to a variety of algorithms with solid theoretical guarantees. Among these, gradient descent based non-convex methods have become particularly popular due to their computational efficiency. However, these methods typically suffer from two key limitations: a sub-optimal sample complexity of

O((n_1 + n_2)r^2)

and an iteration complexity of

O(\kappa \log(1/\epsilon))

to achieve

\epsilon

-accuracy, resulting in slow convergence when the target matrix is ill-conditioned. Here,

\kappa

denotes the condition number of the unknown matrix. Recent studies show that a preconditioned variant of GD, known as scaled gradient descent (ScaledGD), can significantly reduce the iteration complexity to

O(\log(1/\epsilon))

. Nonetheless, its sample complexity remains sub-optimal at

O((n_1 + n_2)r^2)

. In contrast, a delicate virtual sequence technique demonstrates that the standard GD in the positive semidefinite (PSD) setting achieves the optimal sample complexity

O((n_1 + n_2)r)

, but converges more slowly with an iteration complexity

O(\kappa^2 \log(1/\epsilon))

. In this paper, through a more refined analysis, we show that ScaledGD achieves both the optimal sample complexity

O((n_1 + n_2)r)

and the improved iteration complexity

O(\log(1/\epsilon))

. Notably, our results extend beyond the PSD setting to general low-rank matrix recovery problem. Numerical experiments further validate that ScaledGD accelerates convergence for ill-conditioned matrices with the optimal sampling complexity.

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

Dev.to

How To Leverage AI for Back-Office Headcount Optimization

Dev.to

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Reddit r/LocalLLaMA

SOTA Language Models Under 14B?

Reddit r/LocalLLaMA

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

Key Points

Abstract

Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

How To Leverage AI for Back-Office Headcount Optimization

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

SOTA Language Models Under 14B?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer