DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

arXiv cs.RO / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DFM-VLAは、離散トークンで行動を表現するVision-Language-Action（VLA）に対し、生成した行動トークンを反復的に更新して誤りを後から修正できるデコーディング手法を提案しています。
提案手法は離散フローマッチングに基づき、トークン列全体を反復ごとにダイナミックに更新する「確率速度場」を学習し、補助速度ヘッド方式とアクション埋め込み誘導方式の2通りを検討しています。
さらに、反復精錬ステージとその後の決定論的バリデーションを組み合わせて、安定した収束を実現する2段階デコード戦略を採用しています。
CALVIN・LIBERO・実環境のロボット操作タスクで、自己回帰VLAや離散拡散/連続拡散ベースラインに対して一貫して性能向上しつつ、推論効率も維持できると報告されています。
具体的には、CALVINで平均成功長4.44、LIBEROで平均成功率95.7%を達成したとされ、離散フローマッチングによる行動精錬の有効性を示しています。

Abstract

Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discrete flow matching VLA for iterative refinement of action tokens. DFM-VLA~models a token-level probability velocity field that dynamically updates the full action sequence across refinement iterations. We investigate two ways to construct the velocity field: an auxiliary velocity-head formulation and an action-embedding-guided formulation. Our framework further adopts a two-stage decoding strategy with an iterative refinement stage followed by deterministic validation for stable convergence. Extensive experiments on CALVIN, LIBERO, and real-world manipulation tasks show that DFM-VLA consistently outperforms strong autoregressive, discrete diffusion, and continuous diffusion baselines in manipulation performance while retaining high inference efficiency. In particular, DFM-VLA achieves an average success length of 4.44 on CALVIN and an average success rate of 95.7\% on LIBERO, highlighting the value of action refinement via discrete flow matching for robotic manipulation. Our project is available \url{https://chris1220313648.github.io/DFM-VLA/}

Black Hat Asia

AI Business

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Dev.to

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Reddit r/artificial

Stop Tweaking Prompts: Build a Feedback Loop Instead

Dev.to

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

Dev.to

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Key Points

Abstract

Related Articles

Black Hat Asia

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Stop Tweaking Prompts: Build a Feedback Loop Instead

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer