Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

arXiv stat.ML / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies learning low-degree spherical polynomials on the unit sphere using an over-parameterized two-layer neural network with an added/augmented feature representation.
It introduces a new training method, Gradient Descent with Projection (GDP), and proves improved sample complexity: for target regression risk ε, the required number of samples scales roughly as n ≍ log(4/δ)·d^{k0}/ε with high probability.
The authors show this rate is nearly unimprovable by relating the network’s achieved regression risk to a nonparametric rate of order log(4/δ)·d^{k0}/n.
They compare against minimax optimal performance for regression with a kernel of rank Θ(d^{k0}), concluding the GDP-trained network attains a nearly minimax optimal rate.
For the practical challenge where the true polynomial degree k0 is unknown, the paper provides a provable adaptive degree-selection algorithm that recovers k0 and preserves the nearly optimal regression rate; it also claims novelty in obtaining nearly optimal bounds using ReLU with algorithmic guarantees, beyond the NTK regime.

Abstract

We study the problem of learning a low-degree spherical polynomial of degree

k_0 = \Theta(1) \ge 1

defined on the unit sphere in

\RR^d

by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk

\eps \in (0, \Theta(d^{-k_0})]

, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of

n \asymp \Theta( \log(4/\delta) \cdot d^{k_0}/\eps)

with probability

1-\delta

for

\delta \in (0,1)

, in contrast with the representative sample complexity

\Theta(d^{k_0} \max\set{\eps^{-2},\log d})

. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order

\log({4}/{\delta}) \cdot \Theta(d^{k_0}/{n})

with probability at least

1-\delta

. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank

\Theta(d^{k_0})

\Theta(d^{k_0}/{n})

, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree

k_0

is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer