Dual-Imbalance Continual Learning for Real-World Food Recognition

arXiv cs.CV / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces DIME, a continual learning method for real-world food recognition that explicitly handles “dual imbalance” from both long-tailed class frequencies and uneven numbers of newly introduced categories at each step.
  • DIME uses parameter-efficient fine-tuning to learn lightweight per-task adapters and then progressively merges them using a class-count guided spectral merging strategy.
  • A rank-wise threshold modulation mechanism is proposed to stabilize adapter merging by retaining dominant knowledge while enabling adaptive updates.
  • The approach outputs a single merged adapter for inference, aiming to keep deployment efficient without maintaining task-specific modules.
  • Experiments on realistic long-tailed food benchmarks under a step-imbalanced protocol show DIME improves by more than 3% over prior strong continual learning baselines, with code released on GitHub.

Abstract

Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning setting, where new categories are introduced sequentially over time. However, existing studies typically assume that each incremental step introduces a similar number of new food classes, which rarely happens in real world where the number of newly observed categories can vary significantly across steps, leading to highly uneven learning dynamics. As a result, continual food recognition exhibits a dual imbalance: imbalanced samples within each food class and imbalanced numbers of new food classes to learn at each incremental learning step. In this work, we introduce DIME, a Dual-Imbalance-aware Adapter Merging framework for continual food recognition. DIME learns lightweight adapters for each task using parameter-efficient fine-tuning and progressively integrates them through a class-count guided spectral merging strategy. A rank-wise threshold modulation mechanism further stabilizes the merging process by preserving dominant knowledge while allowing adaptive updates. The resulting model maintains a single merged adapter for inference, enabling efficient deployment without accumulating task-specific modules. Experiments on realistic long-tailed food benchmarks under our step-imbalanced setup show that the proposed method consistently improves by more than 3% over the strongest existing continual learning baselines. Code is available at https://github.com/xiaoyanzhang1/DIME.