Analysis of Nystrom method with sequential ridge leverage scores

arXiv cs.LG / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses large-scale kernel ridge regression by using Nystrom-style column subsampling to avoid storing the full kernel matrix, noting that the subsampling distribution drives the tradeoffs.
It leverages recent insights that sampling proportional to ridge leverage scores (RLSs) yields strong reconstruction guarantees, but tackles the fact that exact RLSs are costly to compute.
The authors propose INK-ESTIMATE, a sequential algorithm that incrementally estimates RLSs while maintaining only a small sketch of the kernel matrix.
INK-ESTIMATE enables a single pass over the kernel matrix, updates the sketch without needing previously seen columns, and uses a fixed small space budget depending only on the kernel’s effective dimension.
The method provides approximation guarantees for both the distance between the true and reconstructed kernel matrices and the statistical risk of the approximate KRR solution at every intermediate step.

Abstract

Large-scale kernel ridge regression (KRR) is limited by the need to store a large kernel matrix K_t. To avoid storing the entire matrix K_t, Nystrom methods subsample a subset of columns of the kernel matrix, and efficiently find an approximate KRR solution on the reconstructed matrix. The chosen subsampling distribution in turn affects the statistical and computational tradeoffs. For KRR problems, recent works show that a sampling distribution proportional to the ridge leverage scores (RLSs) provides strong reconstruction guarantees for the approximation. While exact RLSs are as difficult to compute as a KRR solution, we may be able to approximate them well enough. In this paper, we study KRR problems in a sequential setting and introduce the INK-ESTIMATE algorithm, that incrementally computes the RLSs estimates. INK-ESTIMATE maintains a small sketch of K_t, that at each step is used to compute an intermediate estimate of the RLSs. First, our sketch update does not require access to previously seen columns, and therefore a single pass over the kernel matrix is sufficient. Second, the algorithm requires a fixed, small space budget to run dependent only on the effective dimension of the kernel matrix. Finally, our sketch provides strong approximation guarantees on the distance between the true kernel matrix and its approximation, and on the statistical risk of the approximate KRR solution at any time, because all our guarantees hold at any intermediate step.