Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks
arXiv cs.CV / 3/19/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces a Base-Station-Helped UAV (BHU) framework to enable communication-efficient multi-UAV cooperative perception in low-altitude wireless networks.
- It employs a Top-K pixel selection to sparsify UAV-captured RGB images, transmitting only the most informative pixels to a ground server to cut data volume and latency.
- The sparsified images are sent via multi-user MIMO (MU-MIMO), where a Swin-large-based MaskDINO encoder extracts BEV features and performs cooperative feature fusion for ground vehicle perception.
- A diffusion-model-based deep reinforcement learning (DRL) algorithm jointly selects cooperative UAVs, sparsification ratios, and precoding matrices to balance communication efficiency and perception utility.
- Experimental results on the Air-Co-Pred dataset show over 5% accuracy/perception improvement while reducing communication overhead by about 85% compared to traditional CNN-based BEV fusion baselines.




![[Boost]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Fuser%252Fprofile_image%252F3833034%252F44fa15e0-8eb9-4843-a424-a4a7b3538f43.jpeg&w=3840&q=75)