VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

VILAS is a low-cost, modular robotic manipulation platform aimed at enabling end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware.
The system combines a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera setup, coordinated via a ZMQ-based architecture that supports teleoperation, data collection, and policy deployment in one workflow.
To safely handle fragile objects without explicit force sensing, VILAS uses a kirigami-based soft compliant gripper extension that creates predictable deformation under compression for gentle, repeatable contact.
The authors fine-tune three leading VLA models (pi_0, pi_0.5, GR00T N1.6) from public checkpoints using an identical teleoperation-collected demonstration dataset, then validate performance on a grape grasping task.
Experiments suggest that effective VLA policies can be trained and deployed using low-cost modular hardware, and the study offers practical guidance on how current VLA models behave in real-world deployment settings.

Abstract

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

Black Hat USA

AI Business

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Key Points

Abstract

Related Articles

Black Hat USA

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Meta will use AI to analyze height and bone structure to identify if users are underage

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer