VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • VILAS is a low-cost, modular robotic manipulation platform aimed at enabling end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware.
  • The system combines a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera setup, coordinated via a ZMQ-based architecture that supports teleoperation, data collection, and policy deployment in one workflow.
  • To safely handle fragile objects without explicit force sensing, VILAS uses a kirigami-based soft compliant gripper extension that creates predictable deformation under compression for gentle, repeatable contact.
  • The authors fine-tune three leading VLA models (pi_0, pi_0.5, GR00T N1.6) from public checkpoints using an identical teleoperation-collected demonstration dataset, then validate performance on a grape grasping task.
  • Experiments suggest that effective VLA policies can be trained and deployed using low-cost modular hardware, and the study offers practical guidance on how current VLA models behave in real-world deployment settings.

Abstract

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.