UAV traffic scene understanding: A cross-spectral guided approach and a unified benchmark

arXiv cs.CV / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a Cross-spectral Traffic Cognition Network (CTCNet) designed for robust UAV traffic scene understanding by fusing optical and thermal modalities to handle adverse illumination.
It features a Prototype-Guided Knowledge Embedding (PGKE) module that uses external Traffic Regulation Memory (TRM) prototypes to ground visual representations with domain-specific regulatory knowledge for recognizing complex traffic behaviors.
It also includes a Quality-Aware Spectral Compensation (QASC) module that enables bidirectional context exchange between optical and thermal streams to mitigate degraded features in challenging environments.
The authors release Traffic-VQA, the first large-scale optical-thermal UAV traffic understanding benchmark (8,180 image pairs and 1.3 million QA pairs across 31 types), and report that CTCNet significantly outperforms state-of-the-art methods; the dataset is publicly available on GitHub.

Abstract

Traffic scene understanding from unmanned aerial vehicle (UAV) platforms is crucial for intelligent transportation systems due to its flexible deployment and wide-area monitoring capabilities. However, existing methods face significant challenges in real-world surveillance, as their heavy reliance on optical imagery leads to severe performance degradation under adverse illumination conditions like nighttime and fog. Furthermore, current Visual Question Answering (VQA) models are restricted to elementary perception tasks, lacking the domain-specific regulatory knowledge required to assess complex traffic behaviors. To address these limitations, we propose a novel Cross-spectral Traffic Cognition Network (CTCNet) for robust UAV traffic scene understanding. Specifically, we design a Prototype-Guided Knowledge Embedding (PGKE) module that leverages high-level semantic prototypes from an external Traffic Regulation Memory (TRM) to anchor domain-specific knowledge into visual representations, enabling the model to comprehend complex behaviors and distinguish fine-grained traffic violations. Moreover, we develop a Quality-Aware Spectral Compensation (QASC) module that exploits the complementary characteristics of optical and thermal modalities to perform bidirectional context exchange, effectively compensating for degraded features to ensure robust representation in complex environments. In addition, we construct Traffic-VQA, the first large-scale optical-thermal infrared benchmark for cognitive UAV traffic understanding, comprising 8,180 aligned image pairs and 1.3 million question-answer pairs across 31 diverse types. Extensive experiments demonstrate that CTCNet significantly outperforms state-of-the-art methods in both cognition and perception scenarios. The dataset is available at https://github.com/YuZhang-2004/UAV-traffic-scene-understanding.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

UAV traffic scene understanding: A cross-spectral guided approach and a unified benchmark

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer