Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

arXiv cs.CL / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Speculative Verification for VLA Control (SV-VLA) to reduce Vision-Language-Action (VLA) inference cost while mitigating open-loop error accumulation.
  • SV-VLA combines a heavy, low-frequency VLA macro-planner that generates action chunks with a lightweight verifier that continuously monitors execution using the latest observations.
  • The verifier compares planned actions against a closed-loop reference conditioned on the current observation and the planning context, triggering replanning only when necessary.
  • Experiments indicate SV-VLA preserves the efficiency benefits of action chunking while improving robustness in dynamic environments.
  • The authors provide released code for SV-VLA at the linked GitHub repository, supporting replication and further development.

Abstract

Vision-Language-Action (VLA) models, as large foundation models for embodied control, have shown strong performance in manipulation tasks. However, their performance comes at high inference cost. To improve efficiency, recent methods adopt action chunking, which predicts a sequence of future actions for open-loop execution. Although effective for reducing computation, open-loop execution is sensitive to environmental changes and prone to error accumulation due to the lack of close-loop feedback. To address this limitation, we propose Speculative Verification for VLA Control (SV-VLA), a framework that combines efficient open-loop long-horizon planning with lightweight closed-loop online verification. Specifically, SV-VLA uses a heavy VLA as a low-frequency macro-planner to generate an action chunk together with a planning context, while a lightweight verifier continuously monitors execution based on the latest observations. Conditioned on both the current observation and the planning context, the verifier compares the planned action against a closed-loop reference action and triggers replanning only when necessary. Experiments demonstrate that SV-VLA combines the efficiency of chunked prediction with the robustness of closed-loop control, enabling efficient and reliable VLA-based control in dynamic environments. Code is available: https://github.com/edsad122/SV-VLA.