Vision-Language Attribute Disentanglement and Reinforcement for Lifelong Person Re-Identification

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • VLADR is a new vision-language model (VLM) -driven lifelong person re-identification method that aims to improve cross-domain knowledge transfer while mitigating forgetting.
  • It introduces a Multi-grain Text Attribute Disentanglement mechanism to mine global and diverse local text attributes in images for finer-grained cross-modal learning.
  • It proposes an Inter-domain Cross-modal Attribute Reinforcement scheme that aligns attributes across domains to guide visual attribute extraction and transfer knowledge.
  • Experiments show VLADR outperforms state-of-the-art methods by about 1.9-2.2% in anti-forgetting and 2.1-2.5% in generalization, with the code available at https://github.com/zhoujiahuan1991/CVPR2026-VLADR.

Abstract

Lifelong person re-identification (LReID) aims to learn from varying domains to obtain a unified person retrieval model. Existing LReID approaches typically focus on learning from scratch or a visual classification-pretrained model, while the Vision-Language Model (VLM) has shown generalizable knowledge in a variety of tasks. Although existing methods can be directly adapted to the VLM, since they only consider global-aware learning, the fine-grained attribute knowledge is underleveraged, leading to limited acquisition and anti-forgetting capacity. To address this problem, we introduce a novel VLM-driven LReID approach named Vision-Language Attribute Disentanglement and Reinforcement (VLADR). Our key idea is to explicitly model the universally shared human attributes to improve inter-domain knowledge transfer, thereby effectively utilizing historical knowledge to reinforce new knowledge learning and alleviate forgetting. Specifically, VLADR includes a Multi-grain Text Attribute Disentanglement mechanism that mines the global and diverse local text attributes of an image. Then, an Inter-domain Cross-modal Attribute Reinforcement scheme is developed, which introduces cross-modal attribute alignment to guide visual attribute extraction and adopts inter-domain attribute alignment to achieve fine-grained knowledge transfer. Experimental results demonstrate that our VLADR outperforms the state-of-the-art methods by 1.9\%-2.2\% and 2.1\%-2.5\% on anti-forgetting and generalization capacity. Our source code is available at https://github.com/zhoujiahuan1991/CVPR2026-VLADR