Natural Gradient Descent for Online Continual Learning

arXiv cs.LG / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets Online Continual Learning (OCL) for image classification, where models must learn from a data stream without assuming i.i.d. data and must avoid catastrophic forgetting.
  • It proposes a training approach using Natural Gradient Descent with an approximation of the Fisher Information Matrix via Kronecker-Factored Approximate Curvature (KFAC) to improve convergence in the online setting.
  • The method yields substantial performance gains across multiple existing OCL methods, indicating the optimizer/curvature component is broadly beneficial.
  • Experiments on Split CIFAR-100, CORE50, and Split miniImageNet show the improvements are especially pronounced when the proposed optimizer is combined with other OCL “tricks.”

Abstract

Online Continual Learning (OCL) for image classification represents a challenging subset of Continual Learning, focusing on classifying images from a stream without assuming data independence and identical distribution (i.i.d). The primary challenge in this context is to prevent catastrophic forgetting, where the model's performance on previous tasks deteriorates as it learns new ones. Although various strategies have been proposed to address this issue, achieving rapid convergence remains a significant challenge in the online setting. In this work, we introduce a novel approach to training OCL models that utilizes the Natural Gradient Descent optimizer, incorporating an approximation of the Fisher Information Matrix (FIM) through Kronecker Factored Approximate Curvature (KFAC). This method demonstrates substantial improvements in performance across all OCL methods, particularly when combined with existing OCL tricks, on datasets such as Split CIFAR-100, CORE50, and Split miniImageNet.