UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • UrbanVGGT is presented as a scalable pipeline to estimate metric sidewalk width from a single street-view image, addressing the scarcity and cost limitations of prior approaches.
  • The method integrates semantic segmentation, feed-forward 3D reconstruction, adaptive ground-plane fitting, camera-height-based scale calibration, and directional width measurement on the reconstructed plane.
  • On a Washington, D.C. ground-truth benchmark, UrbanVGGT reports a mean absolute error of 0.252 m and 95.5% of estimates within 0.50 m of reference widths.
  • Ablation and geometry-backbone comparisons indicate that camera-height-based metric scale calibration is the most critical component for accuracy.
  • The paper demonstrates feasibility by applying the pipeline to three cities and releasing a prototype dataset (SV-SideWidth) covering 527 OpenStreetMap street segments, while noting the need for broader validation and local auditing before authoritative planning use.

Abstract

Sidewalk width is an important indicator of pedestrian accessibility, comfort, and network quality, yet large-scale width data remain scarce in most cities. Existing approaches typically rely on costly field surveys, high-resolution overhead imagery, or simplified geometric assumptions that limit scalability or introduce systematic error. To address this gap, we present UrbanVGGT, a measurement pipeline for estimating metric sidewalk width from a single street-view image. The method combines semantic segmentation, feed-forward 3D reconstruction, adaptive ground-plane fitting, camera-height-based scale calibration, and directional width measurement on the recovered plane. On a ground-truth benchmark from Washington, D.C., UrbanVGGT achieves a mean absolute error of 0.252 m, with 95.5% of estimates within 0.50 m of the reference width. Ablation experiments show that metric scale calibration is the most critical component, and controlled comparisons with alternative geometry backbones support the effectiveness of the overall design. As a feasibility demonstration, we further apply the pipeline to three cities and generate SV-SideWidth, a prototype sidewalk-width dataset covering 527 OpenStreetMap street segments. The results indicate that street-view imagery can support scalable generation of candidate sidewalk-width attributes, while broader cross-city validation and local ground-truth auditing remain necessary before deployment as authoritative planning data.