Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

arXiv cs.RO / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how to achieve better Sim-to-Real generalization for dexterous manipulation when using synthetic data instead of expensive real-world collection.
It empirically evaluates four main determinants—multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning update strategies—to identify what most influences transfer performance.
The authors introduce a comprehensive evaluation protocol that measures real-world task performance while systematically varying background, lighting, distractors, object types, and spatial features.
Experiments across more than 10,000 real-world trials yield actionable insights on which simulation ingredients drive stronger generalist policy transfer, with explicit relevance to Vision-Language-Action (VLA) models.
To enable reproducibility and standardized benchmarking, the study releases the robotic platforms and the evaluation protocol for public use.

Abstract

Learning a generalist control policy for dexterous manipulation typically relies on large-scale datasets. Given the high cost of real-world data collection, a practical alternative is to generate synthetic data through simulation. However, the resulting synthetic data often exhibits a significant gap from real-world distributions. While many prior studies have proposed algorithms to bridge the Sim-to-Real discrepancy, there remains a lack of principled research that grounds these methods in real-world manipulation tasks, particularly their performance on generalist policies such as Vision-Language-Action (VLA) models. In this study, we empirically examine the primary determinants of Sim-to-Real generalization across four dimensions: multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning updates. To support this study, we design a comprehensive evaluation protocol to quantify the real-world performance of manipulation tasks. The protocol accounts for key variations in background, lighting, distractors, object types, and spatial features. Through experiments involving over 10k real-world trials, we derive critical insights into Sim-to-Real transfer. To inform and advance future studies, we release both the robotic platforms and the evaluation protocol for public access to facilitate independent verification, thereby establishing a realistic and standardized benchmark for dexterous manipulation policies.

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Reddit r/artificial

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Key Points

Abstract

Related Articles

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer