Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

arXiv cs.CV / 4/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses unsupervised video-based visible–infrared person re-identification (VI-ReID), which is more realistic for surveillance than existing approaches that are mostly image-focused or supervised.
It proposes HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework that avoids explicit hard pseudo-label assignment while learning from RGB and infrared tracklets.
HiTPro uses a temporal-aware feature encoder to produce both discriminative frame-level features and robust tracklet-level representations.
It introduces hierarchical alignment with two-stage positive mining (within-modality first, then cross-modality) using dynamic thresholds and soft weight assignment, followed by hierarchical contrastive learning across three alignment levels.
Experiments on HITSZ-VCM and BUPTCampus show that HiTPro achieves state-of-the-art results in fully unsupervised settings and sets a strong baseline for future work.

Abstract

Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Dev.to

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

Key Points

Abstract

Related Articles

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer