PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
arXiv cs.AI / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- PRTS (Primitive Reasoning and Tasking System) is a new vision-language-action (VLA) foundation model that reframes robot pretraining from supervised behavior cloning to goal-conditioned reinforcement learning.
- It treats language instructions as goals and uses contrastive reinforcement learning to learn a shared embedding space where state-action and goal embeddings quantitatively reflect goal reachability and feasibility over time.
- The model learns dense goal-reachability supervision from offline trajectories without requiring reward annotations, integrating it into the VLM backbone with a role-aware causal mask that adds negligible overhead.
- Pretrained on 167B tokens, PRTS achieves state-of-the-art results across multiple LIBERO variants, SimplerEnv, and 14 real-world tasks, with especially large improvements for long-horizon, contact-rich, and zero-shot novel-instruction scenarios.
- Overall, the approach improves both execution success and long-horizon planning for general-purpose robotic policies by bridging semantic goal reasoning with temporal task progress.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER