A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters
arXiv cs.AI / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- SimE is a simple and efficient framework for incremental learning that uses adapters within a vision-language model to address training efficiency, memory-bank reliance, and backbone requirements in IL.
- The paper reveals a nonlinear relationship between adapter connections and IL performance: adding more connections between transformer blocks helps, while adding more within-block connections for small incremental steps can hurt IL ability.
- Empirical results show SimE surpasses traditional methods by 9.6% on TinyImageNet and outperforms other CLIP-based methods by 5.3% on CIFAR-100.
- The authors propose boosting zero-shot capabilities by replacing SimE's encoder with a CLIP model trained on larger datasets (e.g., LAION2B) and stronger architectures (e.g., ViT-L/14).




