PubSwap: Public-Data Off-Policy Coordination for Federated RLVR
arXiv cs.LG / 4/15/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces PubSwap, a federated RLVR (reinforcement learning from verifiable rewards) framework aimed at scaling reasoning post-training beyond centralized settings.
- It reduces communication cost and client drift by using LoRA-based local adaptation while periodically performing off-policy coordination using a small shared public dataset.
- During public-data steps, PubSwap selectively replaces locally incorrect responses with globally correct ones, using shared response-level signals to keep clients better aligned with a global objective.
- The authors report consistent improvements over standard baselines on mathematical and medical reasoning benchmarks, suggesting the approach is broadly effective.
- Overall, the work presents a practical “recipe” for federated reasoning post-training that combines low-rank updates with lightweight public-data anchoring without exposing private data.
Related Articles
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Failure to Reproduce Modern Paper Claims [D]
Reddit r/MachineLearning
Why don’t they just use Mythos to fix all the bugs in Claude Code?
Reddit r/LocalLLaMA