LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

arXiv cs.RO / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces LaST-R1, a Vision-Language-Action (VLA) framework that performs latent chain-of-thought (CoT) reasoning over physical dynamics before executing actions.
It argues that prior VLA approaches either rely on slow/discretized explicit linguistic reasoning or use continuous latent reasoning while still being limited to static imitation learning.
The authors propose Latent-to-Action Policy Optimization (LAPO), an RL post-training method that jointly optimizes the latent reasoning process and the action generation, bridging reasoning and control.
LaST-R1 includes an adaptive latent CoT mechanism that adjusts the reasoning horizon dynamically according to environment complexity.
Experiments report near-perfect performance (99.8% average success) on the LIBERO benchmark with one-shot supervised warm-up, plus up to 44% gains in real-world-like deployment across multiple complex single- and dual-arm tasks, with strong sim-to-real generalization.

Abstract

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing explicit linguistic reasoning that suffers from latency and discretization, or utilizing more expressive continuous latent reasoning, they are predominantly confined to static imitation learning that limits adaptability and generalization. While online reinforcement learning (RL) has been introduced to VLAs to enable trial-and-error exploration, current methods exclusively optimize the vanilla action space, bypassing the underlying physical reasoning process. In this paper, we present \textbf{LaST-R1}, a unified VLA framework that integrates latent Chain-of-Thought (CoT) reasoning over physical dynamics prior to action execution, along with a tailored RL post-training paradigm. Specifically, we propose \textbf{Latent-to-Action Policy Optimization (LAPO)}, a novel RL algorithm that jointly optimizes the latent reasoning process and the action generation. By bridging reasoning and control, LAPO improves the representation of physical world modeling and enhances robustness in interactive environments. Furthermore, an \textbf{adaptive latent CoT mechanism} is introduced to allow the policy to dynamically adjust its reasoning horizon based on environment complexity. Extensive experiments show that LaST-R1 achieves a near-perfect 99.8\% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art methods. In real-world deployments, LAPO post-training yields up to a 44\% improvement over the initial warm-up policy across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/1DailyView insight →

The foundational UK sovereign-AI patents are filed. The collaboration door is open.

Dev.to

Building a Shopify app with Claude Code — spec-driven development and pricing design

Dev.to

From Chaos to Clarity: AI-Powered Client Portals for Designers

Dev.to

Stuck in the Mud (and Loops!) - Kiwi-chan Devlog #7

Dev.to

Addition is All You Need for Energy-efficient Language Models

Dev.to

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Key Points

Abstract

💡 Insights using this article

Related Articles

The foundational UK sovereign-AI patents are filed. The collaboration door is open.

Building a Shopify app with Claude Code — spec-driven development and pricing design

From Chaos to Clarity: AI-Powered Client Portals for Designers

Stuck in the Mud (and Loops!) - Kiwi-chan Devlog #7

Addition is All You Need for Energy-efficient Language Models

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer