NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection

arXiv cs.CV / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies a key train–test mismatch in open-vocabulary object detection, where unlabeled novel-category objects are repeatedly treated as background and filtered out early, reducing recall at inference.
It introduces NoOVD, a training framework that uses self-distillation from frozen vision-language models (VLMs) to help the detector discover novel categories without needing extra data.
A proposed component (K-FPN) transfers VLM knowledge to guide novel-category discovery and avoid forced alignment of novel objects with background.
During inference, the method adds R-RPN to adjust proposal confidence scores to improve the recall of novel-category objects.
Experiments across OV-LVIS, OV-COCO, and Objects365 show consistent performance improvements across multiple evaluation metrics.

Abstract

Despite the remarkable progress in open-vocabulary object detection (OVD), a significant gap remains between the training and testing phases. During training, the RPN and RoI heads often misclassify unlabeled novel-category objects as background, causing some proposals to be prematurely filtered out by the RPN while others are further misclassified by the RoI head. During testing, these proposals again receive low scores and are removed in post-processing, leading to a significant drop in recall and ultimately weakening novel-category detection performance.To address these issues, we propose a novel training framework-NoOVD-which innovatively integrates a self-distillation mechanism grounded in the knowledge of frozen vision-language models (VLMs). Specifically, we design K-FPN, which leverages the pretrained knowledge of VLMs to guide the model in discovering novel-category objects and facilitates knowledge distillation-without requiring additional data-thus preventing forced alignment of novel objects with background.Additionally, we introduce R-RPN, which adjusts the confidence scores of proposals during inference to improve the recall of novel-category objects. Cross-dataset evaluations on OV-LVIS, OV-COCO, and Objects365 demonstrate that our approach consistently achieves superior performance across multiple metrics.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

They Did Not Accidentally Make Work the Answer to Who You Are

Dev.to

NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

They Did Not Accidentally Make Work the Answer to Who You Are

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer