AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation

arXiv cs.RO / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • AnchorVLAは、モバイルマニピュレーションで「複数の妥当な行動モデルを保ちつつ、実行中は高い反応性を維持する」という課題に対し、拡散ポリシーを効率化する方針を示している。
  • 全体の反復デノイジングを毎ステップ行うコストを避けるため、アンカートラジェクトリ近傍でローカルにのみデノイジングする“anchored diffusion”と、短縮した拡散スケジュールを採用して推論遅延を低減している。
  • Action chunkingによる部分的なオープンループ性が原因のドリフトに対して、テスト時セルフコレクション(軽量残差補正モジュール)で高周波の1ステップ調整を入れ、ロールアウトの安定性を高めている。
  • 複数のモバイル操作タスクおよび攪乱・分布シフト条件で、成功率と安定性を改善しつつ低レイテンシ推論を維持することが報告されている。

Abstract

A central challenge in mobile manipulation is preserving multiple plausible action models while remaining reactive during execution. A bottle in a cluttered scene can often be approached and grasped in multiple valid ways. Robust behavior depends on preserving this action diversity while remaining reactive as the scene evolves. Diffusion policies are appealing because they model multimodal action distributions rather than collapsing to one solution. But in practice, full iterative denoising is costly at control time. Action chunking helps amortize inference, yet it also creates partially open-loop behavior, allowing small mismatches to accumulate into drift. We present AnchorVLA, a diffusion-based VLA policy for mobile manipulation built on the core insight that when sampling begins near a plausible solution manifold, extensive denoising is unnecessary to recover multimodal, valid actions. AnchorVLA combines a lightweight VLA adaptation backbone with an anchored diffusion action head, which denoises locally around anchor trajectories using a truncated diffusion schedule. This retains multimodal action generation while reducing inference cost for closed-loop control. Crucially, to mitigate chunking-induced drift, we introduce a test-time self-correction mechanism via a lightweight residual correction module that makes high-frequency, per-step adjustments during rollout. Across diverse mobile manipulation tasks, AnchorVLA improves success and stability under disturbances and distribution shifts while maintaining low-latency inference. The source code is made available at https://github.com/jason-lim26/AnchorVLA.