FASTER: Value-Guided Sampling for Fast RL
arXiv cs.LG / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- Reinforcement learning methods that use test-time sampling of multiple action candidates can be highly effective but computationally expensive due to selecting the best candidate after sampling.
- The paper introduces FASTER, which recovers the benefits of sampling-based test-time scaling for diffusion-based policies by tracing and filtering action candidates earlier in the denoising process.
- FASTER formulates the denoising-and-selection procedure as a Markov Decision Process (MDP) and learns a value-guided policy to progressively filter candidates while maximizing expected returns.
- Experiments on long-horizon manipulation tasks show FASTER improves both online and batch-online reinforcement learning performance and attains the best results among compared approaches.
- When applied to a pretrained VLA, FASTER achieves comparable performance while substantially reducing both training and inference compute requirements, and code is provided on GitHub.
Related Articles

Black Hat USA
AI Business

Autoencoders and Representation Learning in Vision
Dev.to
Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.
Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful
Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to