Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

arXiv cs.CV / 4/6/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces UAVGen, a layout-to-image diffusion framework designed specifically to improve UAV-based object detection in dynamic scenes with limited labeled data.
  • It proposes a Visual Prototype Conditioned Diffusion Model (VPC-DM) that uses class-level visual prototypes embedded in the latent space to generate higher-fidelity object instances.
  • UAVGen also adds a Focal Region Enhanced Data Pipeline (FRE-DP) that emphasizes object-dense foreground regions during synthetic data generation to reduce boundary-related artifacts for tiny objects.
  • A label refinement step is included to correct missing, extra, and misaligned generations, improving the usefulness of synthesized training images.
  • Experiments report that UAVGen significantly outperforms prior state-of-the-art methods and improves detection accuracy across multiple detector architectures, with code released publicly.

Abstract

Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.