LiveVLN: Breaking the Stop-and-Go Loop in Vision-Language Navigation

arXiv cs.RO / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that real-world vision-language navigation still suffers from visibly stop-and-go behavior because the sense–inference–execution loop is blocking, forcing the controller to wait for sensing/transmission/inference before moving again.
  • It introduces LiveVLN, a training-free runtime framework that augments pretrained VLM navigators with multi-step action continuation so the system can keep actions available while newly arrived observations are being processed.
  • LiveVLN overlaps execution with processing of fresh observations, handing off refreshed future actions before the currently executable action prefix is exhausted to reduce idle waiting.
  • Experiments on R2R and RxR show that the method preserves benchmark performance while reducing waiting time and improving action availability.
  • In deployment-oriented evaluations on StreamVLN and NaVIDA, LiveVLN reduces average episode waiting time by up to 77.7% and shortens wall-clock episode time by 12.6%–19.6%.

Abstract

Recent navigation systems achieve strong benchmark results, yet real-world deployment often remains visibly stop-and-go. This bottleneck arises because the sense-inference-execution loop is still blocking: after each new observation, the controller must wait for sensing, transmission, and inference before motion can continue. Reducing action-generation cost alone therefore does not remove redundant waiting. To address this issue, we present LiveVLN, a training-free framework for more continuous embodied navigation by augmenting pretrained VLM navigators with multi-step action continuation. Instead of pausing for each full sense-and-inference round, LiveVLN overlaps execution with the processing of newly arrived observations, allowing refreshed future actions to be handed off before the current executable prefix is exhausted. This design keeps actions continuously available during motion, reducing idle waiting and enabling smoother online execution. The framework operates at runtime and can be integrated with compatible pretrained VLM navigators. Across R2R and RxR, LiveVLN preserves benchmark performance while reducing waiting time and improving action availability. In real-world deployments, it cuts average episode waiting time by up to 77.7\% and shortens wall-clock episode time by 12.6\% on StreamVLN and 19.6\% on NaVIDA, yielding more coherent execution during deployment. Code is available at https://github.com/NIneeeeeem/LiveVLN.