Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

arXiv cs.LG / 2026/3/31

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

The paper presents a controlled empirical study of Low-Rank Adaptation (LoRA) for sequential fine-tuning of pretrained transformer encoders, focusing on whether it reduces catastrophic forgetting versus full fine-tuning.
In five reruns on a BERT-base sequence (RTE→MRPC→CoLA→SST-2), full fine-tuning shows about 19.9%±4.8% average forgetting, while standard LoRA (r=8 on query/value modules) reduces forgetting to about 0.6%±1.4% with statistically significant improvement.
Task-level analyses and secondary experiments on RoBERTa-base confirm that LoRA’s reduced forgetting is not just an aggregate artifact, outperforming the strongest Elastic Weight Consolidation (EWC) baseline (≈15.5%±1.4% forgetting).
A six-task extension demonstrates that low average forgetting can mask substantial task-level heterogeneity, highlighting the need for more granular evaluation in continual learning settings.
Freezing and representation-probe ablations indicate a mechanistic account: forgetting drops notably once frozen parameters exceed ~95%, and probes suggest backbone freezing preserves a more stable shared feature scaffold, with full fine-tuning diverging most clearly at the final transformer layer.

Abstract

Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with companion representation probes that test a frozen-backbone explanation of its robustness. In five full-validation BERT-base reruns on an RTE->MRPC->CoLA->SST-2 sequence, full fine-tuning yields 19.9%+/-4.8% average forgetting, whereas standard LoRA (r=8, query/value modules) yields 0.6%+/-1.4% (paired t-test, p=0.002, Cohen's d_s=3.12). Task-level analyses confirm this reduction is not merely an aggregate effect. Secondary experiments on RoBERTa-base show the same pattern, and the strongest EWC baseline remains at 15.5%+/-1.4% forgetting. A six-task extension reveals that low average forgetting can hide strong task-level heterogeneity. Fine-grained freezing ablations show a marked forgetting drop once frozen parameters exceed roughly 95%, with classifier-only and shallow-adapter baselines approaching LoRA. Companion task-similarity probes in GPT-2 and RoBERTa show the same directional story: frozen-backbone regimes preserve higher inter-task similarity than full fine-tuning, gradual unfreezing weakens stability, and full fine-tuning exhibits its clearest divergence at the final transformer layer. These results support a restrained mechanistic interpretation: LoRA helps largely because backbone freezing preserves a more stable shared feature scaffold. We position standard LoRA as both a strong empirical baseline for sequential encoder adaptation and a useful probe of how selective plasticity shapes interference in transformer continual learning.

Black Hat Asia

AI Business

ラピダスCTO、1ナノでTSMCと「半年差に」まずは信頼獲得から

日経XTECH

「Galaxy S26 Ultra」、のぞき見防ぐ最上機買って分かったAIの進化

日経XTECH

RotorQuant vs TurboQuant — KVキャッシュ量子化の最前線

Qiita

【備忘録】分類モデルの基本的な評価指標（Accuracy / Recall / Precision / F1スコア）まとめ

Qiita

Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

要点

Abstract

関連記事

Black Hat Asia

ラピダスCTO、1ナノでTSMCと「半年差に」まずは信頼獲得から

「Galaxy S26 Ultra」、のぞき見防ぐ最上機買って分かったAIの進化

RotorQuant vs TurboQuant — KVキャッシュ量子化の最前線

【備忘録】分類モデルの基本的な評価指標（Accuracy / Recall / Precision / F1スコア）まとめ

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

Abstract

関連記事

Black Hat Asia

ラピダスCTO、1ナノでTSMCと「半年差に」 まずは信頼獲得から

「Galaxy S26 Ultra」、のぞき見防ぐ最上機 買って分かったAIの進化

RotorQuant vs TurboQuant — KVキャッシュ量子化の最前線

【備忘録】分類モデルの基本的な評価指標（Accuracy / Recall / Precision / F1スコア）まとめ

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

ラピダスCTO、1ナノでTSMCと「半年差に」まずは信頼獲得から

「Galaxy S26 Ultra」、のぞき見防ぐ最上機買って分かったAIの進化