Unsupervised Multi-agent and Single-agent Perception from Cooperative Views

arXiv cs.CV / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a gap in LiDAR perception by proposing an unsupervised framework that can jointly handle multi-agent and single-agent 3D perception without human annotations.
  • It identifies two core benefits of cooperative sensor sharing: denser point clouds from multiple agents improve unsupervised object classification, and multi-agent cooperative views can provide unsupervised guidance for single-view 3D object detection.
  • The proposed UMS framework uses a Proposal Purifying Filter to refine candidate proposals after density cooperation, a Progressive Proposal Stabilizing module to generate reliable pseudo-labels via easy-to-hard curriculum learning, and Cross-View Consensus Learning to transfer cooperative guidance to single-agent detection.
  • Experiments on V2V4Real and OPV2V show UMS achieves significantly better 3D detection performance than prior state-of-the-art methods in an unsupervised setting for both perception settings.
  • Overall, the work suggests that cross-agent communication plus consensus learning can reduce reliance on labeled data for real-world robotic and automated-vehicle perception pipelines.

Abstract

The LiDAR-based multi-agent and single-agent perception has shown promising performance in environmental understanding for robots and automated vehicles. However, there is no existing method that simultaneously solves both multi-agent and single-agent perception in an unsupervised way. By sharing sensor data between multiple agents via communication, this paper discovers two key insights: 1) Improved point cloud density after the data sharing from cooperative views could benefit unsupervised object classification, 2) Cooperative view of multiple agents can be used as unsupervised guidance for the 3D object detection in the single view. Based on these two discovered insights, we propose an Unsupervised Multi-agent and Single-agent (UMS) perception framework that leverages multi-agent cooperation without human annotations to simultaneously solve multi-agent and single-agent perception. UMS combines a learning-based Proposal Purifying Filter to better classify the candidate proposals after multi-agent point cloud density cooperation, followed by a Progressive Proposal Stabilizing module to yield reliable pseudo labels by the easy-to-hard curriculum learning. Furthermore, we design a Cross-View Consensus Learning to use multi-agent cooperative view to guide detection in single-agent view. Experimental results on two public datasets V2V4Real and OPV2V show that our UMS method achieved significantly higher 3D detection performance than the state-of-the-art methods on both multi-agent and single-agent perception tasks in an unsupervised setting.