Prototype-Based Low Altitude UAV Semantic Segmentation

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes PBSeg, a prototype-based semantic segmentation framework specifically designed for low-altitude UAV imagery where scale variation and fine object boundaries are difficult under edge-device compute constraints.
  • It introduces prototype-based cross-attention (PBCA) to leverage feature redundancy and reduce computational complexity while aiming to preserve segmentation quality.
  • PBSeg uses an efficient multi-scale feature extraction module that combines deformable convolutions (DConv) with context-aware modulation (CAM) to capture both local details and global semantics.
  • Experiments on UAVid and UDD6 show strong results, reaching 71.86% mIoU on UAVid and 80.92% mIoU on UDD6, indicating competitive accuracy with improved efficiency.
  • The authors provide implementation code via GitHub, enabling researchers and developers to reproduce and build upon the method.

Abstract

Semantic segmentation of low-altitude UAV imagery presents unique challenges due to extreme scale variations, complex object boundaries, and limited computational resources on edge devices. Existing transformer-based segmentation methods achieve remarkable performance but incur high computational overhead, while lightweight approaches struggle to capture fine-grained details in high-resolution aerial scenes. To address these limitations, we propose PBSeg, an efficient prototype-based segmentation framework tailored for UAV applications. PBSeg introduces a novel prototype-based cross-attention (PBCA) that exploits feature redundancy to reduce computational complexity while maintaining segmentation quality. The framework incorporates an efficient multi-scale feature extraction module that combines deformable convolutions (DConv) with context-aware modulation (CAM) to capture both local details and global semantics. Experiments on two challenging UAV datasets demonstrate the effectiveness of the proposed approach. PBSeg achieves 71.86\% mIoU on UAVid and 80.92\% mIoU on UDD6, establishing competitive performance while maintaining computational efficiency. Code is available at https://github.com/zhangda1018/PBSeg.