IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
arXiv cs.LG / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates compute-optimal allocation for on-policy RL in LLMs, focusing on three resources: parallel rollouts per problem, number of problems per batch, and number of update steps.
- It shows that the compute-optimal number of parallel rollouts per problem grows with compute budget and then saturates, driven by solution sharpening on easy problems and coverage expansion on hard problems.
- Increasing parallel rollouts reduces interference across problems, while the number of problems per batch mainly affects training stability and can be chosen from a broad range.
- Validated across base models and data distributions, the work reframes RL scaling laws as prescriptive allocation rules and offers practical guidance for compute-efficient LLM RL post-training.




