(How) Learning Rates Regulate Catastrophic Overtraining

arXiv cs.LG / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper examines how supervised fine-tuning (SFT) can cause “catastrophic overtraining,” potentially degrading an LLM’s underlying capabilities after long pretraining.
It analyzes catastrophic forgetting during finetuning through the lens of implicit learning-rate regularization, showing that even when optimizing to the same SFT loss, large vs. small learning-rate step sizes converge to qualitatively different model behaviors.
The authors connect forgetting to overtraining by arguing that learning-rate decay increases the sharpness of the pretrained model, which then worsens catastrophic forgetting during SFT.
Overall, the work proposes a mechanism explaining how optimization dynamics across pretraining and finetuning interact to produce overtraining.

Abstract

Supervised fine-tuning (SFT) is a common first stage of LLM post-training, teaching the model to follow instructions and shaping its behavior as a helpful assistant. At the same time, SFT may harm the fundamental capabilities of an LLM, particularly after long pretraining: a phenomenon known as catastrophic overtraining (Springer et al., 2025). To understand overtraining, we first investigate catastrophic forgetting in finetuning through the lens of implicit regularization of the learning rate. For models trained to the same SFT loss, we identify how the learning rate mediates optimization: finetuning with large and small steps converges to qualitatively different models. Next, we link forgetting to overtraining: learning rate decay increases the sharpness of the pretrained model, which in turn exacerbates catastrophic forgetting during SFT, leading to overtraining. Our findings paint a picture of the overtraining mechanism in LLMs and broadly contribute to the understanding of the interplay between optimization dynamics during pretraining and finetuning.