ParkSense: Where Should a Delivery Driver Park? Leveraging Idle AV Compute and Vision-Language Models

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ParkSense, a framework that uses idle compute from low-risk AV states to run a vision-language model for precise delivery parking-spot selection near merchant entrances.
  • ParkSense repurposes pre-cached satellite and street-view imagery to identify entrances and legal parking zones, formalizing the Delivery-Aware Precision Parking (DAPP) problem.
  • The authors report that a quantized 7B VLM can perform inference in about 4–8 seconds on HW4-class hardware, supporting near-real-time decision needs.
  • They estimate potential annual per-driver income gains in the U.S. of roughly $3,000–$8,000, arguing the approach can reduce time lost searching for parking.
  • The work outlines five open research directions bridging autonomous driving, computer vision, and last-mile logistics.

Abstract

Finding parking consumes a disproportionate share of food delivery time, yet no system addresses precise parking-spot selection relative to merchant entrances. We propose ParkSense, a framework that repurposes idle compute during low-risk AV states -- queuing at red lights, traffic congestion, parking-lot crawl -- to run a Vision-Language Model (VLM) on pre-cached satellite and street view imagery, identifying entrances and legal parking zones. We formalize the Delivery-Aware Precision Parking (DAPP) problem, show that a quantized 7B VLM completes inference in 4-8 seconds on HW4-class hardware, and estimate annual per-driver income gains of 3,000-8,000 USD in the U.S. Five open research directions are identified at this unexplored intersection of autonomous driving, computer vision, and last-mile logistics.