HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

arXiv cs.RO / 4/28/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

Vision-Language-Action（VLA）モデルはロボット制御で主流になりつつある一方、推論が遅いという課題があり、加速手法としてSpeculative Decoding（SD）が注目されています。
SDには「drafter-based」と「retrieval-based」の2系統があり、それぞれ強みと弱みが補完的であるため、両者を組み合わせたハイブリッド化が有効ではないかという仮説が提示されています。
しかしVLAでのハイブリッドSD実装には、リトリーバル側での下書き拒否や継続的な誤り、さらにハイブリッド境界の決定が難しいといった課題があると分析されます。
これらに対処するため、HeiSDは「verify-skip」や「sequence-wise relaxed acceptance」に基づくリトリーバルベース最適化、さらに運動（kinematic）を用いた融合評価指標でハイブリッド境界を自動決定する枠組みを提案しています。
実験ではHeiSDがシミュレーションで最大2.45倍、実環境で2.06〜2.41倍の速度向上を達成しつつ、高いタスク成功率を維持したと報告されています。

Abstract

Vision-Language-Action (VLA) Models have become the mainstream solution for robot control, but suffer from slow inference speeds. Speculative Decoding (SD) is a promising acceleration method which can be divided into two categories: drafter-based SD and retrieval-based SD. Each of the two methods demonstrates complementary advantages and limitations when applied to VLA models, leading to the hypothesis that a hybrid approach integrating these two methods will yield better performance. In this paper, we first conduct a series of detailed analyses to reveal the advantages and feasibility of hybrid utilization. However, even with the aforementioned key insights, implementing hybrid SD in VLA models presents several challenges: (1) draft rejection and persistent errors in retrieval-based SD; (2) difficulty in determining the hybrid boundary. To address these, we propose the HeiSD framework. We propose a retrieval-based SD optimization method in HeiSD, which contains a verify-skip mechanism and a sequence-wise relaxed acceptance strategy. Moreover, we proposed a kinematic-based fused metric in HeiSD to automatically determine the hybrid boundary. Experimental results demonstrate that HeiSD attains a speedup of up to 2.45x in simulation benchmarks and 2.06x~2.41x in real-world scenarios, while sustaining a high task success rate.

DeepSeek V4 Released: 1.6T Parameters, 1M Context, and Floor-Shattering Prices

Dev.to

Understanding Intelligent Automation Integration: A Complete Beginner's Guide

Dev.to

AI时代开启，2025 回顾与总结

Dev.to

Building an Al food tracker and currently tackling Apple Health integration. How do you prefer your „active calories“ to be handled?

Reddit r/artificial

The New Era of GEO: How Traffic Generator AI is Changing the Game

Dev.to

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Key Points

Abstract

Related Articles

DeepSeek V4 Released: 1.6T Parameters, 1M Context, and Floor-Shattering Prices

Understanding Intelligent Automation Integration: A Complete Beginner's Guide

AI时代开启，2025 回顾与总结

Building an Al food tracker and currently tackling Apple Health integration. How do you prefer your „active calories“ to be handled?

The New Era of GEO: How Traffic Generator AI is Changing the Game

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer