(How) Learning Rates Regulate Catastrophic Overtraining
arXiv cs.LG / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper examines how supervised fine-tuning (SFT) can cause “catastrophic overtraining,” potentially degrading an LLM’s underlying capabilities after long pretraining.
- It analyzes catastrophic forgetting during finetuning through the lens of implicit learning-rate regularization, showing that even when optimizing to the same SFT loss, large vs. small learning-rate step sizes converge to qualitatively different model behaviors.
- The authors connect forgetting to overtraining by arguing that learning-rate decay increases the sharpness of the pretrained model, which then worsens catastrophic forgetting during SFT.
- Overall, the work proposes a mechanism explaining how optimization dynamics across pretraining and finetuning interact to produce overtraining.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to