DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models

arXiv cs.RO / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Vision-Language-Action (VLA) models face deployment challenges on resource-limited robots, and naive Post-Training Quantization (PTQ) can severely degrade sequential control performance.
  • The paper identifies temporal error accumulation at the vision-language-to-action interface as the driver of kinematic drift, where small quantization perturbations progressively amplify over time.
  • It introduces Drift-Aware Post-Training Quantization (DA-PTQ), casting quantization as a drift-aware optimization problem across sequential decision processes.
  • DA-PTQ uses (1) Cross-Space Representation Compensation to reduce structured distortions between multimodal representations and the action space, and (2) Motion-Driven Mixed-Precision Allocation to choose bit-widths by minimizing trajectory-level motion errors.
  • Experiments indicate DA-PTQ substantially reduces kinematic drift and can match full-precision performance under low-bit quantization settings, supporting efficient robotic deployment.

Abstract

Vision-Language-Action models (VLAs) have demonstrated strong potential for embodied AI, yet their deployment on resource-limited robots remains challenging due to high memory and computational demands. While Post-Training Quantization (PTQ) provides an efficient solution, directly applying PTQ to VLAs often results in severe performance degradation during sequential control. We identify temporal error accumulation as a key factor, where quantization perturbations at the vision-language-to-action interface are progressively amplified, leading to kinematic drift in executed trajectories. To address this issue, we propose Drift-Aware Post-Training Quantization (DA-PTQ), which formulates quantization as a drift-aware optimization problem over sequential decision processes. DA-PTQ consists of two components: (1) Cross-Space Representation Compensation, which mitigates structured distortions between multimodal representations and action space to improve action consistency, and (2) Motion-Driven Mixed-Precision Allocation, which assigns bit-widths by minimizing trajectory-level motion errors. Extensive experiments show that DA-PTQ significantly reduces kinematic drift and achieves comparable performance to full-precision models under low-bit settings, enabling practical deployment of VLAs on resource-limited robotic platforms.