NearID: Identity Representation Learning via Near-identity Distractors

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that standard vision encoders often entangle object identity with background context, causing unreliable evaluations for identity-focused tasks like personalized generation and image editing.
  • It proposes a Near-identity (NearID) distractor framework that places semantically similar but different instances onto the exact same background to prevent contextual shortcut learning and isolate identity.
  • The authors release the NearID dataset (19K identities and 316K matched-context distractors) along with a strict margin-based SSR evaluation protocol to better measure cross-view identity discrimination.
  • Experiments show that off-the-shelf pre-trained encoders can perform poorly (SSR as low as 30.7%), with distractors frequently ranked above true matches, motivating the method.
  • Using a two-tier contrastive objective on a frozen backbone, the approach raises SSR to 99.2% and improves part-level discrimination by 28%, with better alignment on the human-aligned DreamBench++ benchmark.

Abstract

When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) distractors, where semantically similar but distinct instances are placed on the exact same background as a reference image, eliminating contextual shortcuts and isolating identity as the sole discriminative signal. Based on this principle, we present the NearID dataset (19K identities, 316K matched-context distractors) together with a strict margin-based evaluation protocol. Under this setting, pre-trained encoders perform poorly, achieving Sample Success Rates (SSR), a strict margin-based identity discrimination metric, as low as 30.7% and often ranking distractors above true cross-view matches. We address this by learning identity-aware representations on a frozen backbone using a two-tier contrastive objective enforcing the hierarchy: same identity > NearID distractor > random negative. This improves SSR to 99.2%, enhances part-level discrimination by 28.0%, and yields stronger alignment with human judgments on DreamBench++, a human-aligned benchmark for personalization. Project page: https://gorluxor.github.io/NearID/

NearID: Identity Representation Learning via Near-identity Distractors | AI Navigate