Diagnosing Urban Street Vitality via a Visual-Semantic and Spatiotemporal Framework for Street-Level Economics

arXiv cs.CV / 4/23/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a visual-semantic and spatiotemporal framework for micro-scale, street-level economic assessment using Street View imagery, aiming to improve beyond semantically superficial methods.
  • It operationalizes the Street Economic Vitality Index (SEVI) by combining physical and semantic streetscape parsing (e.g., signboards, glass interfaces, storefront closures) with a dual-stage VLM-LLM pipeline to standardize signage into global brand hierarchies.
  • To address the static nature of typical Street View data, it introduces a temporal-lag design using location-based services (LBS) data to capture realized demand over time.
  • It builds a three-dimensional diagnostic system that covers Commercial Activity, Spatial Utilization, and Physical Environment using a category-weighted Gaussian spillover model.
  • Experiments in Nanjing using time-lagged geographically weighted regression across eight tidal periods find quasi-causal spatiotemporal heterogeneity, including brand-cluster effects and mall-induced externalities as key drivers of street vibrancy.

Abstract

Micro-scale street-level economic assessment is fundamental for precision spatial resource allocation. While Street View Imagery (SVI) advances urban sensing, existing approaches remain semantically superficial and overlook brand hierarchy heterogeneity and structural recession. To address this, we propose a visual-semantic and field-based spatiotemporal framework, operationalized via the Street Economic Vitality Index (SEVI). Our approach integrates physical and semantic streetscape parsing through instance segmentation of signboards, glass interfaces, and storefront closures. A dual-stage VLM-LLM pipeline standardizes signage into global hierarchies to quantify a spatially smoothed brand premium index. To overcome static SVI limitations, we introduce a temporal lag design using Location-Based Services (LBS) data to capture realized demand. Combined with a category-weighted Gaussian spillover model, we construct a three-dimensional diagnostic system covering Commercial Activity, Spatial Utilization, and Physical Environment. Experiments based on time-lagged geographically weighted regression across eight tidal periods in Nanjing reveal quasi-causal spatiotemporal heterogeneity. Street vibrancy arises from interactions between hierarchical brand clustering and mall-induced externalities. High-quality interfaces show peak attraction during midday and evening, while structural recession produces a lagged nighttime repulsion effect. The framework offers evidence-based support for precision spatial governance.