RL Token: Bootstrapping Online RL with Vision-Language-Action Models
arXiv cs.LG / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper proposes RL Token (RLT), a lightweight method for sample-efficient online reinforcement learning fine-tuning of pretrained vision-language-action (VLA) models using only a few hours of real-world practice.
- RLT modifies a VLA to expose an “RL token” that retains task-relevant pretrained knowledge while providing an efficient interface for online RL, and it trains a small actor-critic head on top of this token to refine actions.
- The learned policy is anchored to the underlying VLA to preserve pretrained capabilities while improving precision and responsiveness needed for real-world manipulation.
- Experiments on four real-robot tasks (screw installation, zip tie fastening, charger insertion, and Ethernet insertion) show up to 3× speed improvements on the most difficult task phase and substantially higher success rates within minutes to a few hours.
- In some tasks, RLT can even outperform the speed of human teleoperation, highlighting its potential for fast and practical robotic skill adaptation.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
How I Automate My Dev Workflow with Claude Code Hooks
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹
Dev.to

Real-Time Monitoring for AI Agents: Beyond Log Streaming
Dev.to