Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification

arXiv cs.CV / 4/22/2026

📰 NewsModels & Research

Key Points

  • The paper proposes ReID-R, a reinforcement reasoning paradigm for person re-identification that aims to learn identity-causal cues rather than relying mainly on perception from large annotated datasets.
  • ReID-R integrates chain-of-thought reasoning into the ReID pipeline using a two-stage approach: a label-free discriminative reasoning warm-up and an efficient reinforcement learning stage with non-trivial sampling to build scene-generalizable data.
  • By using high-quality reward signals, the method guides the model to focus on identity-related visual cues, improving both identification accuracy and reasoning behavior.
  • Experiments across multiple ReID benchmarks show competitive performance while using only 14.3K non-trivial data (about 20.9% of the prior data scale), indicating improved data efficiency.
  • The authors claim ReID-R’s built-in reasoning also yields higher-quality interpretations of results, not just improved accuracy.

Abstract

Learning identity-discriminative representations with multi-scene generality has become a critical objective in person re-identification (ReID). However, mainstream perception-driven paradigms tend to identify fitting from massive annotated data rather than identity-causal cues understanding, which presents a fragile representation against multiple disruptions. In this work, ReID-R is proposed as a novel reasoning-driven paradigm that achieves explicit identity understanding and reasoning by incorporating chain-of-thought into the ReID pipeline. Specifically, ReID-R consists of a two-stage contribution: (i) Discriminative reasoning warm-up, where a model is trained in a CoT label-free manner to acquire identity-aware feature understanding; and (ii) Efficient reinforcement learning, which proposes a non-trivial sampling to construct scene-generalizable data. On this basis, ReID-R leverages high-quality reward signals to guide the model toward focusing on ID-related cues, achieving accurate reasoning and correct responses. Extensive experiments on multiple ReID benchmarks demonstrate that ReID-R achieves competitive identity discrimination as superior methods using only 14.3K non-trivial data (20.9% of the existing data scale). Furthermore, benefit from inherent reasoning, ReID-R can provide high-quality interpretation for results.