Exploring the Use of VLMs for Navigation Assistance for People with Blindness and Low Vision
arXiv cs.AI / 3/18/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper investigates the potential of vision-language models (VLMs) to assist people with blindness and low vision (pBLV) in navigation, evaluating both closed-source and open-source models such as GPT-4V, GPT-4o, Gemini-1.5-Pro, Claude-3.5-Sonnet, Llava-v1.6-mistral, and Llava-onevision-qwen.
- GPT-4o consistently outperforms other models across tasks, especially in spatial reasoning and scene understanding, while open-source models show limitations in nuanced reasoning and adaptability in complex environments.
- Common challenges identified include difficulties counting objects in clutter, biases in spatial reasoning, and a tendency to emphasize object details over spatial feedback, reducing navigation usability for pBLV.
- The study finds that VLMs still have promising potential for wayfinding when better aligned with human feedback and improved spatial reasoning, suggesting actionable insights for integrating VLMs into assistive technologies.
- The results provide guidance on strengths and limitations of current VLMs and outline directions for enhancing usability in real-world pBLV navigation applications.