Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

arXiv cs.AI / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a parameter-efficient fine-tuning strategy to improve the linguistic generalization of OpenVLA by synthesizing a general instruction set for the Bridge Dataset V2 using a Large Language Model (LLM).
It employs Low-Rank Adaptation (LoRA) to fine-tune OpenVLA on augmented trajectory-command pairs, bridging natural language intent and robotic actions more effectively.
The approach generates a diverse set of semantically equivalent but structurally varied commands for existing trajectories to enrich the model's linguistic space.
Results indicate improved robustness of the LoRA-enhanced model in novel environments, underscoring the importance of linguistic space enrichment for embodied agents.
The work suggests that synthetic instruction augmentation can significantly mitigate zero-shot generalization gaps in state-of-the-art Vision-Language-Action models.

Abstract

Generalization remains a core challenge in embodied AI, as robots must adapt to diverse environments. While OpenVLA represents the State-of-the-Art (SOTA) in Vision-Language-Action models by leveraging large-scale pre-training, its zero-shot performance can be limited when encountering completely new environments. This paper proposes a parameter-efficient fine-tuning strategy to enhance the linguistic generalization of OpenVLA by synthesizing a general instruction set for the Bridge Dataset V2. The paper leverages a Large Language Model (LLM) to generate a rich variety of semantically equivalent but structurally diverse commands for existing trajectories. In this experiment, Low-Rank Adaptation (LoRA) is implemented to fine-tune OpenVLA on augmented pairs, allowing the model to bridge the gap between complex natural language intent and robotic actions. Results demonstrate that the LoRA-enhanced model's robustness, suggesting that enriching the linguistic space of specialized datasets is crucial for embodied agents.

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

A supervisor or "manager" Al agent is the wrong way to control Al

Reddit r/artificial

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA

Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

Key Points

Abstract

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

A supervisor or "manager" Al agent is the wrong way to control Al

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer