GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
arXiv cs.CV / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces GOLD-BEV, a framework for learning dense, scene-centric semantic BEV maps that include dynamic agents using ego-centric sensors.
- It uses time-synchronized aerial imagery as training supervision by aligning BEV with aerial crops, which provides an intuitive target and reduces ambiguity compared with ego-only BEV labeling.
- By enforcing strict aerial-ground synchronization, the method more reliably supervises moving traffic participants and reduces temporal inconsistencies seen in non-synchronized overhead sources.
- For scalable dense targets, the authors generate BEV pseudo-labels with domain-adapted aerial “teachers” and jointly train BEV segmentation, optionally adding pseudo-aerial BEV reconstruction for interpretability.
- The approach further synthesizes pseudo-aerial BEV images from ego sensors to enable lightweight human annotation and uncertainty-aware pseudo-labeling on unlabeled driving data.
Related Articles

Enterprise AI Governance Has Shifted from Policy to Execution
Dev.to

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases

Build-in-Public: What I Learned Building an AI Image SaaS
Dev.to
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to