A Quantitative Characterization of Forgetting in Post-Training
arXiv cs.LG / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a two-mode mixture abstraction (representing old and new tasks) to theoretically characterize forgetting in continual post-training of generative models, defining two forms: mass forgetting and old-component drift.
- In the equal-covariance Gaussian setting, forward-KL training on the new distribution drives the old mixture weight to zero (mass forgetting), whereas reverse-KL objectives converge to the target and cause drift only via overlap-gated misassignment, controlled by the Bhattacharyya coefficient with exponential decay as mode separation grows.
- The authors show how replay interacts with these objectives: for forward-KL, replay must modify the training distribution to change the population optimum; for reverse-KL, replay leaves the objective unchanged but prevents finite-batch old-mode starvation through bounded importance weighting.
- They analyze three post-training methods (SDFT, TTT-Discover, OAPL) and derive explicit conditions under which each retains old mass or exhibits overlap-controlled drift.
- Overall, forgetting can be precisely quantified based on the interaction between divergence direction, geometric behavioral overlap, sampling regime, and the visibility of past behavior during training.
Related Articles
Automating the Chase: AI for Festival Vendor Compliance
Dev.to
MCP Skills vs MCP Tools: The Right Way to Configure Your Server
Dev.to
500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)
Dev.to
Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?
Dev.to

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER