Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

arXiv stat.ML / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper reports new “dynamical” scaling laws that describe how deep learning performance changes throughout training, not just at the end of convergence.
It identifies two norm-based complexity measures that govern learning-curve evolution, and shows that together they reproduce the classic test-error scaling at convergence.
The results are validated across multiple model families (CNNs, ResNets, Vision Transformers) and datasets (MNIST, CIFAR-10, CIFAR-100).
The authors provide analytical evidence using a single-layer perceptron with logistic loss, explaining the scaling via implicit bias from gradient-based training.
Overall, the work links training dynamics, scaling regularities, and interpretability-related foundations through the lens of implicit bias.

Abstract

Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training. In this work, we describe a richer picture by analyzing the entire training dynamics: we identify two novel \textit{dynamical} scaling laws that govern how performance evolves as function of different norm-based complexity measures. Combined, our new laws recover the well-known scaling for test error at convergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a single-layer perceptron trained with logistic loss, where we derive the new dynamical scaling laws, and we explain them through the implicit bias induced by gradient-based training.

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Dev.to

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Dev.to

AI made learning fun again

Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

Dev.to

Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

Key Points

Abstract

Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

AI made learning fun again

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer