Fine-Tuning Regimes Define Distinct Continual Learning Problems

arXiv cs.LG / 4/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that in continual learning evaluations, the fine-tuning regime (the trainable parameter subspace) should be treated as an explicit experimental variable rather than being held fixed.
It formalizes adaptation regimes as projected optimization over fixed trainable subspaces and shows that changing the trainable depth changes the update signal balancing new-task learning and knowledge retention.
Experiments on task-incremental continual learning with five trainable-depth regimes and four methods (online EWC, LwF, SI, GEM) across multiple datasets find that method rankings vary across regimes.
The study finds that deeper adaptation regimes produce larger update magnitudes and higher forgetting, and strengthen the link between update size and forgetting.
Overall, the results motivate regime-aware evaluation protocols where trainable depth is included as a factor to avoid misleading cross-method comparisons.

Abstract

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable parameter subspace, is itself a key evaluation variable. We formalize adaptation regimes as projected optimization over fixed trainable subspaces, showing that changing the trainable depth alters the effective update signal through which both current task fitting and knowledge preservation operate. This analysis motivates the hypothesis that method comparisons need not be invariant across regimes. We test this hypothesis in task incremental CL, five trainable depth regimes, and four standard methods: online EWC, LwF, SI, and GEM. Across five benchmark datasets, namely MNIST, Fashion MNIST, KMNIST, QMNIST, and CIFAR-100, and across 11 task orders per dataset, we find that the relative ranking of methods is not consistently preserved across regimes. We further show that deeper adaptation regimes are associated with larger update magnitudes, higher forgetting, and a stronger relationship between the two. These results show that comparative conclusions in CL can depend strongly on the chosen fine-tuning regime, motivating regime-aware evaluation protocols that treat trainable depth as an explicit experimental factor.