A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

arXiv cs.RO / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces A1, an open-source Vision-Language-Action (VLA) framework aimed at lowering the high compute and latency costs that limit real-time robotic control using large VLM backbones and diffusion/flow action heads.
A1 uses pretrained VLMs as implicit affordance priors for action generation, while also targeting the end-to-end inference pipeline with a budget-aware adaptive scheme.
It implements early termination by monitoring action consistency across intermediate VLM layers, reducing unnecessary computation during inference.
The method “Inter-Layer Truncated Flow Matching” warm-starts denoising across layers to achieve accurate actions with substantially fewer denoising iterations.
Experiments on simulation benchmarks (LIBERO, VLABench) and real robots (Franka, AgiBot), plus RoboChallenge, report state-of-the-art success rates alongside large latency and backbone compute reductions (e.g., up to 72% lower per-episode latency and up to 76.6% backbone computation reduction with minor degradation).

Abstract

Vision--Language--Action (VLA) models have emerged as a powerful paradigm for open-world robot manipulation, but their practical deployment is often constrained by \emph{cost}: billion-scale VLM backbones and iterative diffusion/flow-based action heads incur high latency and compute, making real-time control expensive on commodity hardware. We present A1, a fully open-source and transparent VLA framework designed for low-cost, high-throughput inference without sacrificing manipulation success; Our approach leverages pretrained VLMs that provide implicit affordance priors for action generation. We release the full training stack (training code, data/data-processing pipeline, intermediate checkpoints, and evaluation scripts) to enable end-to-end reproducibility. Beyond optimizing the VLM alone, A1 targets the full inference pipeline by introducing a budget-aware adaptive inference scheme that jointly accelerates the backbone and the \emph{action head}. Specifically, we monitor action consistency across intermediate VLM layers to trigger early termination, and propose Inter-Layer Truncated Flow Matching that warm-starts denoising across layers, enabling accurate actions with substantially fewer effective denoising iterations. Across simulation benchmarks (LIBERO, VLABench) and real robots (Franka, AgiBot), A1 achieves state-of-the-art success rates while significantly reducing inference cost (e.g., up to 72% lower per-episode latency for flow-matching inference and up to 76.6% backbone computation reduction with minor performance degradation). On RoboChallenge, A1 achieves an average success rate of 29.00%, outperforming baselines including pi0(28.33%), X-VLA (21.33%), and RDT-1B (15.00%).

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/8DailyView insight →

Black Hat Asia

AI Business

The enforcement gap: why finding issues was never the problem

Dev.to

How I Built AI-Powered Auto-Redaction Into a Desktop Screenshot Tool

Dev.to

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

Dev.to

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

The enforcement gap: why finding issues was never the problem

How I Built AI-Powered Auto-Redaction Into a Desktop Screenshot Tool

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer