High-Dimensional Private Linear Regression with Optimal Rates

arXiv stat.ML / 4/28/2026

💬 OpinionModels & Research

Key Points

  • The paper studies differentially private (DP) linear regression in the high-dimensional random-data regime where the dimension-to-sample ratio satisfies d/n → γ.
  • It analyzes a class of one-pass DP gradient descent (DP-GD) algorithms under (ρ²/2)-zero concentrated DP, deriving a deterministic equivalent of the algorithm trajectory via an ODE system.
  • Using this framework, the authors quantify how setting the gradient clipping constant smaller than the typical per-sample gradient norm (an approach that often improves empirical performance) affects convergence and risk.
  • For well-conditioned data, they show DP-GD can achieve non-asymptotic risk O(γ + γ²/ρ²) with appropriately chosen clipping constant and learning rate, and they prove this rate is minimax optimal.
  • For ill-conditioned data with power-law covariance spectra, they derive a different (power-like) scaling of the risk in γ and show how the scaling exponent changes with the privacy parameter ρ, also highlighting benefits from aggressive clipping and decaying learning-rate schedules.

Abstract

Differentially private (DP) linear regression has received significant attention in the recent theoretical literature, with several approaches proposed to improve error rates. Our work considers the popular high-dimensional regime with random data, where the number of training samples n and the input dimension d grow at a proportional rate d / n \to \gamma, and it studies a family of one-pass DP gradient descent (DP-GD) algorithms satisfying \rho^2 / 2 zero concentrated DP. In this setting, we establish a deterministic equivalent for the DP-GD trajectory in terms of a system of ordinary differential equations. This allows to analyze the effect of gradient clipping constants that are smaller than the typical norm of the per-sample gradients - a setup shown to improve performance in practice. For well-conditioned data, we show that DP-GD, upon properly choosing clipping constant and learning rate, achieves the non-asymptotic risk of O(\gamma + \gamma^2 / \rho^2), and we establish that this rate is minimax optimal. Then, we consider the ill-conditioned case where the data covariance spectrum follows a power-law distribution, and we show that the risk displays a power-like scaling law in \gamma, highlighting the change in the exponent as a function of the privacy parameter \rho. Overall, our analysis demonstrates the benefits of practical algorithmic design choices, including aggressive gradient clipping and decaying learning rate schedules.