DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

arXiv cs.RO / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that physical AI needs asynchronous execution (“thinking while acting”) because synchronous executors’ inter-chunk pauses are harmful for dynamic environments, even if inference is fast.
  • It reviews real-time chunking (RTC) as an inpainting-style approach (freezing committed actions and generating the rest) and claims RTC using flow-matching policies is structurally suboptimal due to relying on inference-time corrections, which reduces pre-training benefit and increases computation and latency.
  • The authors propose DiscreteRTC, using discrete diffusion policies that generate actions by iteratively unmasking, positioning them as a natural fit for asynchronous execution without extra external corrections.
  • DiscreteRTC is presented as fine-tuning free for the inpainting behavior, with adaptive early stopping that lowers inference cost and improves execution success.
  • Experiments on dynamic simulated benchmarks and real-world dynamic manipulation tasks reportedly show higher success rates than continuous RTC and other baselines, including a 50% higher success rate on a real-world dynamic pick task versus flow-matching-based RTC.

Abstract

Unlike chatbots, physical AI must act while the world keeps evolving. Therefore, the inter-chunk pause of synchronous executors are fatal for dynamic tasks regardless of how fast the inference is. Asynchronous execution -- thinking while acting -- is therefore a structural requirement, and real-time chunking (RTC) makes it viable by recasting chunk transitions as inpainting: freezing committed actions and consistently generating the remainder. However, RTC with flow-matching policy is structurally suboptimal: its inpainting comes from inference-time corrections rather than the base policy, yielding little pre-training benefit, specific fine-tuning, heuristic guidance, and extra computation that inflates the latency. In this work, we observe that discrete diffusion policies, which generate actions by iteratively unmasking, are natural asynchronous executors that resolve all limitations at once: they are fine-tuning free since inpainting is their native operation, while early stopping further provides adaptive guidance and reduces inference cost. We propose DiscreteRTC, which replaces external corrections with native unmasking, and show on dynamic simulated benchmarks and real-world dynamic manipulation tasks that it achieves higher success rates than continuous RTC and other baselines. In summary, DiscreteRTC is simpler to implement with 0 lines of code for async inpainting, faster at inference with only 0.7x computation compared with generating actions from scratch, and better at execution with 50% higher success rate in real-world dynamic pick task compared with flow-matching-based RTC. More visualizations are on https://outsider86.github.io/DiscreteRTCSite/.