A Multihead Continual Learning Framework for Fine-Grained Fashion Image Retrieval with Contrastive Learning and Exponential Moving Average Distillation

arXiv cs.CV / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing fine-grained fashion image retrieval (FIR) methods assume a static attribute/class space and require costly full retraining when new attributes appear, motivating class-incremental learning for dynamic settings.
  • It proposes MCL-FIR, a multihead continual learning framework that supports evolving classes across increments while using contrastive learning with an InfoNCE-style formulation derived from reformulated triplet inputs.
  • The method adds exponential moving average (EMA) distillation to transfer knowledge efficiently across increments without needing repeated full retraining.
  • Experiments on four datasets show that MCL-FIR improves scalability, achieves a favorable efficiency–accuracy tradeoff, and outperforms continual-learning baselines under comparable training cost.
  • Compared with static retraining approaches, the framework reaches comparable retrieval performance while using roughly 30% of the training cost, and the authors provide public source code.

Abstract

Most fine-grained fashion image retrieval (FIR) methods assume a static setting, requiring full retraining when new attributes appear, which is costly and impractical for dynamic scenarios. Although pretrained models support zero-shot inference, their accuracy drops without supervision, and no prior work explores class-incremental learning (CIL) for fine-grained FIR. We propose a multihead continual learning framework for fine-grained fashion image retrieval with contrastive learning and exponential moving average (EMA) distillation (MCL-FIR). MCL-FIR adopts a multi-head design to accommodate evolving classes across increments, reformulates triplet inputs into doublets with InfoNCE for simpler and more effective training, and employs EMA distillation for efficient knowledge transfer. Experiments across four datasets demonstrate that, beyond its scalability, MCL-FIR achieves a strong balance between efficiency and accuracy. It significantly outperforms CIL baselines under similar training cost, and compared with static methods, it delivers comparable performance while using only about 30% of the training cost. The source code is publicly available in https://github.com/Dr-LingXiao/MCL-FIR.