Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

arXiv cs.AI / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a parameter-efficient fine-tuning strategy to improve the linguistic generalization of OpenVLA by synthesizing a general instruction set for the Bridge Dataset V2 using a Large Language Model (LLM).
It employs Low-Rank Adaptation (LoRA) to fine-tune OpenVLA on augmented trajectory-command pairs, bridging natural language intent and robotic actions more effectively.
The approach generates a diverse set of semantically equivalent but structurally varied commands for existing trajectories to enrich the model's linguistic space.
Results indicate improved robustness of the LoRA-enhanced model in novel environments, underscoring the importance of linguistic space enrichment for embodied agents.
The work suggests that synthetic instruction augmentation can significantly mitigate zero-shot generalization gaps in state-of-the-art Vision-Language-Action models.

Abstract

Generalization remains a core challenge in embodied AI, as robots must adapt to diverse environments. While OpenVLA represents the State-of-the-Art (SOTA) in Vision-Language-Action models by leveraging large-scale pre-training, its zero-shot performance can be limited when encountering completely new environments. This paper proposes a parameter-efficient fine-tuning strategy to enhance the linguistic generalization of OpenVLA by synthesizing a general instruction set for the Bridge Dataset V2. The paper leverages a Large Language Model (LLM) to generate a rich variety of semantically equivalent but structurally diverse commands for existing trajectories. In this experiment, Low-Rank Adaptation (LoRA) is implemented to fine-tune OpenVLA on augmented pairs, allowing the model to bridge the gap between complex natural language intent and robotic actions. Results demonstrate that the LoRA-enhanced model's robustness, suggesting that enriching the linguistic space of specialized datasets is crucial for embodied agents.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer