RISE: Self-Improving Robot Policy with Compositional World Model

arXiv cs.RO / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces RISE, a framework for robotic reinforcement learning that aims to make Vision-Language-Action (VLA) policies more robust in contact-rich, dynamic manipulation tasks where small execution errors can cascade into failures.
RISE uses a Compositional World Model with a controllable dynamics component to predict multi-view future states and a progress/value model to score imagined outcomes and compute informative advantages.
By separating the world-model and value/evaluation components, the approach tailors state prediction and value estimation using distinct architectures aligned to different objectives.
The system runs a closed-loop “self-improving” pipeline that repeatedly generates imagined rollouts, estimates advantages, and updates the policy entirely in imaginary space, reducing the need for expensive and risky on-policy physical RL.
Experiments on three real-world tasks show substantial gains over prior work, including more than +35% absolute improvement in dynamic brick sorting, +45% in backpack packing, and +35% in box closing.

Abstract

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

RISE: Self-Improving Robot Policy with Compositional World Model

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer