One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

arXiv cs.AI / 3/16/2026

📰 NewsModels & Research

共有:

Key Points

The One-Step Flow Policy (OFP) is a from-scratch self-distillation framework that enables high-fidelity, single-step action generation for visuomotor policies without requiring a pre-trained teacher.
OFP combines a self-consistency loss to enforce coherent transport across time intervals and a self-guided regularization to sharpen predictions toward high-density expert modes, plus a warm-start mechanism that leverages temporal action correlations.
In 56 simulated manipulation tasks, one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while delivering over 100x faster action generation.
When integrated into RoboTwin 2.0's pi_0.5 model, one-step OFP surpasses the original 10-step policy, demonstrating practical, scalable low-latency robotic control.

Abstract

Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over

100\times

. We further integrate OFP into the

\pi_{0.5}

model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

Reddit r/MachineLearning

My Experience with Qwen 3.5 35B

Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

VentureBeat

Qwen 3.5 122B completely falls apart at ~ 100K context

Reddit r/LocalLLaMA

One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies

Key Points

Abstract

Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

My Experience with Qwen 3.5 35B

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

Qwen 3.5 122B completely falls apart at ~ 100K context

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer