Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

arXiv cs.CV / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a new camera–radar fusion paradigm called heterogeneous query interaction for autonomous driving, aiming to improve both sensing complementarity and deployment practicality.
It presents ConFusion, a 3D object detector that uses multiple query types—image queries, radar queries, and learnable world queries distributed in 3D space—to enhance query initialization and improve object coverage.
To strengthen interaction across query types, the authors propose heterogeneous query mixing (QMix), which applies dedicated cross-type attention after feature sampling to consolidate complementary evidence.
They further introduce interactive query swap sampling (QSwap), enabling related queries to exchange informative feature tokens while respecting attention and geometric constraints to improve sampling quality.
On nuScenes, ConFusion reports state-of-the-art results with 59.1 mAP / 65.6 NDS on the validation set and 61.6 mAP / 67.9 NDS on the test set.

Abstract

In autonomous driving, camera-radar fusion offers complementary sensing and low deployment cost. Existing methods perform fusion through input mixing, feature map mixing, or query-based feature sampling. We propose a new fusion paradigm, termed heterogeneous query interaction, and present ConFusion, a camera-radar 3D object detector. ConFusion combines image queries, radar queries, and learnable world queries distributed in 3D space to improve query initialization and object coverage. To encourage cross-type interaction among heterogeneous queries, we introduce heterogeneous query mixing (QMix), which performs dedicated cross-type attention after feature sampling to consolidate complementary object evidence. We further propose interactive query swap sampling (QSwap), which improves feature sampling by allowing related queries to exchange informative feature tokens under attention and geometric constraints. Experiments on the nuScenes dataset show that ConFusion achieves state-of-the-art performance, reaching 59.1 mAP and 65.6 NDS on the validation set, and 61.6 mAP and 67.9 NDS on the test set.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer