Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

arXiv cs.RO / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces “Move-Then-Operate,” a vision-language action framework that splits robotic manipulation into two phases: coarse relocation (“move”) and contact-critical interaction (“operate”).
Instead of using a single monolithic policy, it uses a dual-expert policy with a learnable phase selector to isolate phase-specific dynamics and improve structural learning.
Phase labels are generated automatically using an MLLM-based pipeline, using contextual cues like end-effector velocity and subtask decomposition to better align robot behavior with human motor patterns.
On the RoboTwin2 benchmark, the method reaches a 68.9% average success rate, improving over the monolithic baseline by 24%, and achieving similar or better results than models trained on 10× more data with 40% fewer training steps.

Abstract

We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of

68.9\%

, outperforming the monolithic

\pi_0

baseline by

24\%

. It matches or exceeds models trained on

10\times

more data and reaches peak performance in

40\%

fewer training steps, demonstrating that architectural disentanglement of move and operate phases is a highly effective and efficient strategy for mastering high-precision manipulation.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/28DailyView insight →

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Dev.to

Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

Key Points

Abstract

💡 Insights using this article

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer