Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies
arXiv cs.CV / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes an annotation-efficient multimodal framework that combines high-resolution satellite imagery with Google Street View to detect urban trees more scalably than labor-intensive field surveys.
- It uses satellite data to localize likely tree candidates and then retrieves targeted street-level views, reducing wasteful full-area street sampling while improving detection detail.
- To overcome limited annotations when moving to new regions, it applies domain adaptation to transfer learned knowledge from an existing annotated dataset to a new region of interest.
- The study evaluates semi-supervised learning, active learning, and a hybrid strategy with a transformer-based detector, finding the hybrid approach yields the best results with an F1-score of 0.90 (about a 12% gain over the baseline).
- Error analysis indicates that active and hybrid strategies reduce both false positives and false negatives, and semi-supervised learning can degrade over time due to confirmation bias from pseudo-labeling.



