Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

arXiv cs.RO / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces “Fast-dVLA,” a method aimed at improving pretrained VLA performance and lowering adaptation cost during standard supervised finetuning (SFT) without relying on heavy auxiliary losses.
  • It decouples auxiliary training goals in parameter space—separating general capability enhancement from task-specific action distribution fitting—by deriving “capability vectors” from small-scale task convergence runs.
  • The capability vectors are merged with pretrained parameters to form a capability-enhanced meta model, intended to capture auxiliary-task benefits more efficiently.
  • The approach further adds a lightweight orthogonal regularization term to augmented standard SFT to achieve results comparable to auxiliary finetuned baselines while reducing computational overhead.
  • Experiments reportedly show strong effectiveness across a variety of robot tasks, suggesting the method generalizes beyond a single benchmark.

Abstract

This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance and reduce the number of convergence steps. However, they typically incur significant computational overhead due to the additional losses from auxiliary tasks. To simultaneously achieve the enhanced capabilities of auxiliary training with the simplicity of standard SFT, we decouple the two objectives of auxiliary task training within the parameter space, namely, enhancing general capabilities and fitting task-specific action distributions. To deliver this goal, we only need to train the model to converge on a small-scale task set using two distinct training strategies. The difference between the resulting model parameters can then be interpreted as capability vectors provided by auxiliary tasks. These vectors are then merged with pretrained parameters to form a capability-enhanced meta model. Moreover, when standard SFT is augmented with a lightweight orthogonal regularization loss, the merged model attains performance comparable to auxiliary finetuned baselines with reduced computational overhead. Experimental results demonstrate that this approach is highly effective across diverse robot tasks. Project page: https://chris1220313648.github.io/Fast-dVLA/
広告