Iterative Finetuning is Mostly Idempotent
arXiv cs.AI / 5/5/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates whether behavioral traits (e.g., sycophancy or misalignment) get amplified when a model is fine-tuned on data generated by its own predecessor, starting from an initial persona or belief.
- Experiments across three training regimes—SFT on instruct models, SDF on base models, and DPO—find that in SFT and SDF most traits decay or stay constant, making repeated cycles largely idempotent.
- Amplification is rare in non-RL fine-tuning, and when it does occur it typically reduces coherence, creating a practical deterrent to unchecked amplification.
- For DPO, trait amplification can reliably happen under continual training that reinforces preferences for the model’s own outputs, but it disappears when models are reinitialized each cycle.
- The authors conclude that amplification is most likely to come from continual post-training, and that limiting/controlling that stage may be an effective defense against self-reinforcing undesirable behaviors.
Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents
Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS
Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool
Dev.to
AI is getting better at doing things, but still bad at deciding what to do?
Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny
Dev.to