AI Navigate

Out-of-Distribution Object Detection in Street Scenes via Synthetic Outlier Exposure and Transfer Learning

arXiv cs.CV / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • SynOE-OD enables a single detector to detect both in-distribution objects and out-of-distribution objects, treating OOD and ID in a unified framework for street-scene detection.
  • It uses synthetic outlier exposure by leveraging strong generative models (e.g., Stable Diffusion) and open-vocabulary detectors to create semantically meaningful outliers for training.
  • The approach applies transfer learning to maintain strong ID task performance while boosting OOD detection robustness, using OVODs such as GroundingDINO.
  • It achieves state-of-the-art average precision on an established OOD object detection benchmark, highlighting improved OOD detection where prior OVODs show limited zero-shot performance in street scenes.

Abstract

Out-of-distribution (OOD) object detection is an important yet underexplored task. A reliable object detector should be able to handle OOD objects by localizing and correctly classifying them as OOD. However, a critical issue arises when such atypical objects are completely missed by the object detector and incorrectly treated as background. Existing OOD detection approaches in object detection often rely on complex architectures or auxiliary branches and typically do not provide a framework that treats in-distribution (ID) and OOD in a unified way. In this work, we address these limitations by enabling a single detector to detect OOD objects, that are otherwise silently overlooked, alongside ID objects. We present \textbf{SynOE-OD}, a \textbf{Syn}thetic \textbf{O}utlier-\textbf{E}xposure-based \textbf{O}bject \textbf{D}etection framework, that leverages strong generative models, like Stable Diffusion, and Open-Vocabulary Object Detectors (OVODs) to generate semantically meaningful, object-level data that serve as outliers during training. The generated data is used for transfer-learning to establish strong ID task performance and supplement detection models with OOD object detection robustness. Our approach achieves state-of-the-art average precision on an established OOD object detection benchmark, where OVODs, such as GroundingDINO, show limited zero-shot performance in detecting OOD objects in street-scenes.