Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes
arXiv cs.LG / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents a controlled empirical study of Low-Rank Adaptation (LoRA) for sequential fine-tuning of pretrained transformer encoders, focusing on whether it reduces catastrophic forgetting versus full fine-tuning.
- In five reruns on a BERT-base sequence (RTE→MRPC→CoLA→SST-2), full fine-tuning shows about 19.9%±4.8% average forgetting, while standard LoRA (r=8 on query/value modules) reduces forgetting to about 0.6%±1.4% with statistically significant improvement.
- Task-level analyses and secondary experiments on RoBERTa-base confirm that LoRA’s reduced forgetting is not just an aggregate artifact, outperforming the strongest Elastic Weight Consolidation (EWC) baseline (≈15.5%±1.4% forgetting).
- A six-task extension demonstrates that low average forgetting can mask substantial task-level heterogeneity, highlighting the need for more granular evaluation in continual learning settings.
- Freezing and representation-probe ablations indicate a mechanistic account: forgetting drops notably once frozen parameters exceed ~95%, and probes suggest backbone freezing preserves a more stable shared feature scaffold, with full fine-tuning diverging most clearly at the final transformer layer.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to