Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning

arXiv cs.RO / 4/9/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that autonomous driving systems are highly vulnerable to rare out-of-distribution semantic anomalies and that current VLM-based anomaly detection is often limited to ad hoc prompting of proprietary models.
It introduces SAVANT, a model-agnostic, structured reasoning framework that decomposes anomaly detection into layered semantic consistency verification using two phases: structured scene description extraction and multimodal evaluation.
Experiments on balanced real-world driving scenarios show SAVANT improves VLM anomaly detection performance, boosting absolute recall by about 18.5% versus prompting baselines.
Using the framework, the authors generate a high-confidence dataset by automatically labeling around 10,000 images with a proprietary best model, addressing data scarcity for anomaly detection.
They fine-tune a 7B open-source model (Qwen2.5-VL) for single-shot anomaly detection, reporting 90.8% recall and 93.8% accuracy and enabling near-zero-cost local deployment.

Abstract

Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution semantic anomalies. While VLMs have emerged as promising tools for perception, their application in anomaly detection remains largely restricted to prompting proprietary models - limiting reliability, reproducibility, and deployment feasibility. To address this gap, we introduce SAVANT (Semantic Anomaly Verification/Analysis Toolkit), a novel model-agnostic reasoning framework that reformulates anomaly detection as a layered semantic consistency verification. By applying SAVANT's two-phase pipeline - structured scene description extraction and multi-modal evaluation - existing VLMs achieve significantly higher scores in detecting anomalous driving scenarios from input images. Our approach replaces ad hoc prompting with semantic-aware reasoning, transforming VLM-based detection into a principled decomposition across four semantic domains. We show that across a balanced set of real-world driving scenarios, applying SAVANT improves VLM's absolute recall by approximately 18.5% compared to prompting baselines. Moreover, this gain enables reliable large-scale annotation: leveraging the best proprietary model within our framework, we automatically labeled around 10,000 real-world images with high confidence. We use the resulting high-quality dataset to fine-tune a 7B open-source model (Qwen2.5-VL) to perform single-shot anomaly detection, achieving 90.8% recall and 93.8% accuracy - surpassing all models evaluated while enabling local deployment at near-zero cost. By coupling structured semantic reasoning with scalable data curation, SAVANT provides a practical solution to data scarcity in semantic anomaly detection for autonomous systems. Supplementary material: https://SAV4N7.github.io

Black Hat USA

AI Business

Black Hat Asia

AI Business

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Reddit r/MachineLearning

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Dev.to

Melhores Alternativas ao NightCafe em 2026: Acesso API, Recursos Empresariais, Menores Custos

Dev.to

Can VLMs Unlock Semantic Anomaly Detection? A Framework for Structured Reasoning

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Melhores Alternativas ao NightCafe em 2026: Acesso API, Recursos Empresariais, Menores Custos

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer