Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

arXiv cs.RO / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Vision-Language-Action（VLA）モデルはロボティクス操作で有望だが、事前学習ポリシーは下流環境の導入時に性能が大きく劣化するという課題がある。
本論文は、微調整や追加データ収集を一切行わずに、推論時だけでVLAポリシーを誘導する「VLA-Pilot」を提案しており、プラグアンドプレイでゼロショット展開を可能にする。
VLA-Pilotは2種類のロボット形態と6つの実環境タスクで評価され、イン・ディストリビューションだけでなくアウト・オブ・ディストリビューションでも有効性が示された。
実験結果は、既製の事前学習済みVLAポリシーの成功率を大幅に押し上げ、多様なタスクやエンベデッド（機体）への堅牢なゼロショット汎化が可能になることを示している。

Abstract

Vision-Language-Action (VLA) models have demonstrated significant potential in real-world robotic manipulation. However, pre-trained VLA policies still suffer from substantial performance degradation during downstream deployment. Although fine-tuning can mitigate this issue, its reliance on costly demonstration collection and intensive computation makes it impractical in real-world settings. In this work, we introduce VLA-Pilot, a plug-and-play inference-time policy steering method for zero-shot deployment of pre-trained VLA without any additional fine-tuning or data collection. We evaluate VLA-Pilot on six real-world downstream manipulation tasks across two distinct robotic embodiments, encompassing both in-distribution and out-of-distribution scenarios. Experimental results demonstrate that VLA-Pilot substantially boosts the success rates of off-the-shelf pre-trained VLA policies, enabling robust zero-shot generalization to diverse tasks and embodiments. Experimental videos and code are available at: https://rip4kobe.github.io/vla-pilot/.

langchain-anthropic==1.4.1

LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

Dev.to

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.

Dev.to

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)

Dev.to

Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

Key Points

Abstract

Related Articles

langchain-anthropic==1.4.1

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer