AI Navigate

Visual Product Search Benchmark

arXiv cs.CV / 3/19/2026

📰 NewsIndustry & Market MovesModels & Research

Key Points

  • The article announces a new, structured benchmark for instance-level image retrieval using visual embedding models in industrial applications.
  • It evaluates a mix of open-source foundation embedding models, proprietary multi-modal systems, and domain-specific vision-only models under a unified image-to-image retrieval protocol without post-processing.
  • The benchmark incorporates industrial datasets from manufacturing, automotive, DIY, and retail alongside public benchmarks to assess transfer to fine-grained instance matching and compare with models trained for industrial tasks.
  • An interactive companion website at benchmark.nyris.io provides results, evaluation details, and visualizations to inform practitioners and researchers about strengths and limitations in production-level product identification systems.

Abstract

Reliable product identification from images is a critical requirement in industrial and commercial applications, particularly in maintenance, procurement, and operational workflows where incorrect matches can lead to costly downstream failures. At the core of such systems lies the visual search component, which must retrieve and rank the exact object instance from large and continuously evolving catalogs under diverse imaging conditions. This report presents a structured benchmark of modern visual embedding models for instance-level image retrieval, with a focus on industrial applications. A curated set of open-source foundation embedding models, proprietary multi-modal embedding systems, and domain-specific vision-only models are evaluated under a unified image-to-image retrieval protocol. The benchmark includes curated datasets, which includes industrial datasets derived from production deployments in Manufacturing, Automotive, DIY, and Retail, as well as established public benchmarks. Evaluation is conducted without post-processing, isolating the retrieval capability of each model. The results provide insight into how well contemporary foundation and unified embedding models transfer to fine-grained instance retrieval tasks, and how they compare to models explicitly trained for industrial applications. By emphasizing realistic constraints, heterogeneous image conditions, and exact instance matching requirements, this benchmark aims to inform both practitioners and researchers about the strengths and limitations of current visual embedding approaches in production-level product identification systems. An interactive companion website presenting the benchmark results, evaluation details, and additional visualizations is available at https://benchmark.nyris.io.