UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model
arXiv cs.CV / 5/7/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper addresses Remote Sensing Image Change Captioning by moving from binary change masks to spatially grounded, semantic natural-language descriptions of scene evolution.
- It proposes PTNet, a prototype-guided, task-adaptive framework that models structured change semantics and combines change detection priors to improve coherence between detected changes and generated captions.
- PTNet uses a learnable prototype bank for cross-temporal interaction, multi-head gating to separate task-specific representations, and detection-derived spatial priors during caption generation to retain fine-grained spatial sensitivity.
- The authors introduce UCCD, a UAV-based large-scale benchmark with 9,000 high-resolution bi-temporal image pairs and 45,000 annotated sentences focused on urban construction monitoring.
- Experiments on UCCD and WHU-CDC show PTNet consistently outperforms prior methods, and the dataset and code are released publicly.
Related Articles

MCP Sentinel v1.0 Is Out: A Lockfile for MCP Tool Schemas
Dev.to

Share of Model: The Metric That Replaces Domain Authority in 2026
Dev.to

Preserving Color in Neural Artistic Style Transfer
Dev.to

I Built an AI Video Factory That Runs 24/7 — Fully Open Source
Dev.to

Your Agency Doesn’t Have a Productivity Problem It Has a Workflow Problem
Dev.to