A Faster Path to Continual Learning

arXiv cs.LG / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • 連続学習(Continual Learning)で、C-Flatは新旧タスク両方で一様に低損失な領域を促す有望な最適化手法だが、各イテレーションで追加の勾配計算が3回必要になり計算コストが課題になっている。
  • 本論文は、C-Flat Turboという改良オプティマイザを提案し、一次(first-order)平坦性に関する勾配が方向不変成分を含むことを利用して、冗長な勾配計算を省略できる点を示している。
  • さらに、平坦性を促す勾配がタスクを追うごとに安定化していく観察結果から、線形スケジューリングと適応的なトリガーを用いて後半タスクほど大きい“turbo steps”を割り当てる戦略を導入している。
  • 実験では、C-Flat Turboが幅広いCL手法の範囲でC-Flatより1.0倍〜1.25倍高速で、精度は同等または改善となることを報告している。

Abstract

Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old tasks. However, C-Flat requires three additional gradient computations per iteration, imposing substantial overhead on the optimization process. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer that significantly reduces the training cost. We show that the gradients associated with first-order flatness contain direction-invariant components relative to the proxy-model gradients, enabling us to skip redundant gradient computations in the perturbed ascent steps. Moreover, we observe that these flatness-promoting gradients progressively stabilize across tasks, which motivates a linear scheduling strategy with an adaptive trigger to allocate larger turbo steps for later tasks. Experiments show that C-Flat Turbo is 1.0\times to 1.25\times faster than C-Flat across a wide range of CL methods, while achieving comparable or even improved accuracy.