VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

arXiv cs.CV / 5/6/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces VL-SAM-v3, a unified approach to open-world object detection that works for both open-vocabulary and open-ended settings.
Instead of relying mainly on coarse text semantics and parametric knowledge, VL-SAM-v3 retrieves external visual prototypes from a non-parametric memory bank to build more reliable visual priors.
It transforms retrieved prototypes into two complementary priors: sparse priors for instance-level spatial anchoring and dense priors for class-aware local context.
The method integrates these priors into detection through Memory-Guided Prompt Refinement, using a shared retrieval-and-refinement mechanism during inference.
Zero-shot experiments on LVIS show consistent improvements in detection, with especially large gains for rare categories, and results with a stronger open-vocabulary detector (SAM3) confirm the generality of the retrieval-refinement design.

Abstract

Open-world object detection aims to localize and recognize objects beyond a fixed closed-set label space. It is commonly divided into two categories, i.e., open-vocabulary detection, which assumes a predefined category list at test time, and open-ended detection, which requires generating candidate categories during the inference. Existing methods rely primarily on coarse textual semantics and parametric knowledge, which often provide insufficient visual evidence for fine-grained appearance variation, rare categories, and cluttered scenes. In this paper, we propose VL-SAM-v3, a unified framework that augments open-world detection with retrieval-grounded external visual memory. Specifically, once candidate categories are available, VL-SAM-v3 retrieves relevant visual prototypes from a non-parametric memory bank and transforms them into two complementary visual priors, i.e., sparse priors for instance-level spatial anchoring and dense priors for class-aware local context. These priors are integrated with the original detection prompts via Memory-Guided Prompt Refinement, enabling a shared retrieval-and-refinement mechanism that supports open-vocabulary and open-ended inference.Extensive zero-shot experiments on LVIS show that VL-SAM-v3 consistently improves detection performance under both open-vocabulary and open-ended inference, with particularly strong gains on rare categories.Moreover, experiments with a stronger open-vocabulary detector (i.e., SAM3) validate the generality of the proposed retrieval-and-refinement mechanism.

Antwerp startup Maurice & Nora raises €1M to address rising care demand

Tech.eu

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Dev.to

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Reddit r/LocalLLaMA

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

Renaissance Philanthropy reshapes science funding with a new model for innovation

Tech.eu

VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

Key Points

Abstract

Related Articles

Antwerp startup Maurice & Nora raises €1M to address rising care demand

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Renaissance Philanthropy reshapes science funding with a new model for innovation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Antwerp startup Maurice &amp; Nora raises €1M to address rising care demand

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Renaissance Philanthropy reshapes science funding with a new model for innovation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Antwerp startup Maurice & Nora raises €1M to address rising care demand