Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks
arXiv cs.CV / 3/19/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces a Base-Station-Helped UAV (BHU) framework to enable communication-efficient multi-UAV cooperative perception in low-altitude wireless networks.
- It employs a Top-K pixel selection to sparsify UAV-captured RGB images, transmitting only the most informative pixels to a ground server to cut data volume and latency.
- The sparsified images are sent via multi-user MIMO (MU-MIMO), where a Swin-large-based MaskDINO encoder extracts BEV features and performs cooperative feature fusion for ground vehicle perception.
- A diffusion-model-based deep reinforcement learning (DRL) algorithm jointly selects cooperative UAVs, sparsification ratios, and precoding matrices to balance communication efficiency and perception utility.
- Experimental results on the Air-Co-Pred dataset show over 5% accuracy/perception improvement while reducing communication overhead by about 85% compared to traditional CNN-based BEV fusion baselines.




