Dynamic Distillation and Gradient Consistency for Robust Long-Tailed Incremental Learning

arXiv cs.CV / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets Long-tailed Class Incremental Learning (LT-CIL), where new classes arrive sequentially with highly imbalanced distributions, making catastrophic forgetting worse while also causing under-learning of minority classes and overfitting of majority classes.
  • It introduces gradient consistency regularization, using a moving average of gradients to reduce abrupt training changes and improve stability.
  • It proposes dynamically re-weighting the distillation loss based on class imbalance measured via normalized entropy, aiming to balance old-knowledge retention with learning new information.
  • Experiments on CIFAR-100-LT, ImageNetSubset-LT, and Food101-LT show accuracy gains up to 5.0% and especially large improvements in the difficult “In-ordered” setting (majority-to-minority task order).
  • The authors report that the gains are achieved without significant additional computational overhead, supporting the method’s practical deployment potential.

Abstract

The task of Long-tailed Class Incremental Learning (LT-CIL) addresses the sequential learning of new classes from datasets with imbalanced class distributions. This scenario intensifies the fundamental problem of catastrophic forgetting, inherent to continual learning, with the dual challenges of under-learning minority classes and overfitting majority classes. To tackle these combined issues, this paper proposes two main techniques. First, we introduce gradient consistency regularization, which leverages the moving average of gradients to suppress abrupt fluctuations and stabilize the training process. Second, we dynamically adjust the weight of the distillation loss by measuring the degree of class imbalance with normalized entropy. This adaptive weighting establishes an optimal balance between retaining old knowledge and acquiring new information. Experiments on the CIFAR-100-LT, ImageNetSubset-LT, and Food101-LT benchmarks show that our method achieves consistent accuracy improvements of up to 5.0\%. Furthermore, we demonstrate dramatic gains in the challenging 'In-ordered' setting, where tasks progress from majority to minority classes, highlighting our method's robustness in mitigating forgetting under unfavorable learning dynamics. This enhanced performance is achieved without a significant increase in computational overhead, demonstrating the practicality of our framework.