GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

arXiv cs.AI / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

GT-Space proposes a flexible framework that creates a common ground-truth feature space to align heterogeneous agent features for collaborative perception in autonomous driving.
The design enables each agent to use a single adapter to project its features into the shared space, eliminating the need for costly pairwise interactions with other agents.
A fusion network trained with contrastive losses across diverse modalities improves detection accuracy on simulation datasets (OPV2V and V2XSet) and a real-world dataset (RCooper).
The work claims scalable handling of heterogeneity and reports empirical gains over baselines, with code to be released on GitHub.
By decoupling feature alignment from specific sensor/model architectures, GT-Space aims to simplify integration of heterogeneous agents in cooperative perception systems.

Abstract

In autonomous driving, multi-agent collaborative perception enhances sensing capabilities by enabling agents to share perceptual data. A key challenge lies in handling {\em heterogeneous} features from agents equipped with different sensing modalities or model architectures, which complicates data fusion. Existing approaches often require retraining encoders or designing interpreter modules for pairwise feature alignment, but these solutions are not scalable in practice. To address this, we propose {\em GT-Space}, a flexible and scalable collaborative perception framework for heterogeneous agents. GT-Space constructs a common feature space from ground-truth labels, providing a unified reference for feature alignment. With this shared space, agents only need a single adapter module to project their features, eliminating the need for pairwise interactions with other agents. Furthermore, we design a fusion network trained with contrastive losses across diverse modality combinations. Extensive experiments on simulation datasets (OPV2V and V2XSet) and a real-world dataset (RCooper) demonstrate that GT-Space consistently outperforms baselines in detection accuracy while delivering robust performance. Our code will be released at https://github.com/KingScar/GT-Space.

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Key Points

Abstract

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer