Are We Recognizing the Jaguar or Its Background? A Diagnostic Framework for Jaguar Re-Identification

arXiv cs.CV / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that jaguar re-identification systems can achieve high retrieval scores while incorrectly using non-identity cues like background context or silhouette shape rather than coat patterns.
  • It proposes a two-axis diagnostic framework: a leakage-controlled context ratio (background vs foreground using inpainted background-only/foreground-only images) and a laterality diagnostic (cross-flank retrieval and mirror self-similarity).
  • To enable objective measurement of these diagnostics, the authors curate a Pantanal jaguar benchmark that includes per-pixel segmentation masks and an identity-balanced evaluation protocol.
  • As case studies, they evaluate multiple mitigation approaches (including ArcFace fine-tuning, anti-symmetry regularization, and Lorentz hyperbolic embeddings) using the same diagnostic lens to assess not just ranking performance but the visual evidence employed.

Abstract

Jaguar re-identification (re-ID) from citizen-science imagery can look strong on standard retrieval metrics while still relying on the wrong evidence, such as background context or silhouette shape, instead of the coat pattern that defines identity. We introduce a diagnostic framework for wildlife re-ID with two axes: a leakage-controlled context ratio, background/foreground, computed from inpainted background-only versus foreground-only images, and a laterality diagnostic based on cross-flank retrieval and mirror self-similarity. To make these diagnostics measurable, we curate a Pantanal jaguar benchmark with per-pixel segmentation masks and an identity-balanced evaluation protocol. We then use representative mitigation families, ArcFace fine-tuning, anti-symmetry regularization, and Lorentz hyperbolic embeddings, as case studies under the same evaluation lens. The goal is not only to ask which model ranks best, but also what visual evidence it uses to do so.