What do near-optimal learning rate schedules look like?
arXiv cs.LG / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper designs a search procedure to find near-optimal learning-rate schedule shapes within a parameterized family and factors out the base learning rate to enable fair comparisons.
- It evaluates the approach on three workloads—linear regression, CIFAR-10 image classification, and Wikitext-103 language modeling—finding near-optimal schedules in practice.
- The results show warmup and decay are robust features of good schedules, while commonly used schedule families are not optimal for these workloads.
- Weight decay can strongly affect the optimal schedule shape, revealing important interactions between hyperparameters.
- The authors claim these results constitute some of the most comprehensive findings on near-optimal schedule shapes for deep neural network training to date.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to