BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception

arXiv cs.CV / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces “preparation-free” heterogeneous cooperative perception, targeting scenarios where agents meet online and cannot rely on offline joint training or tailored adaptation.
  • It finds that straightforward cross-agent feature fusion can perform worse than ego-only perception when no prior coordination is available.
  • To address this, the authors propose BOLT, a lightweight plug-and-play module that performs online adaptation using ego-as-teacher distillation without requiring ground-truth labels.
  • BOLT uses high-confidence ego features to align neighbor features and allows neighbor contributions particularly in ego’s low-confidence regions.
  • Experiments show large gains in AP@50 (up to +32.3 points) using only 0.9M trainable parameters, consistently outperforming ego-only baselines on DAIR-V2X and OPV2V.

Abstract

Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego's low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.