Improving Zero-Shot Offline RL via Behavioral Task Sampling
arXiv cs.AI / 4/29/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies offline zero-shot reinforcement learning, where an agent must optimize reward functions it has never seen without further environment interaction.
- It argues that existing methods rely on randomly sampling task vectors, which may fail to represent the true structure of the task space and therefore harms zero-shot generalization.
- The authors propose extracting task vectors directly from the offline dataset to form a more principled task distribution for training task-conditioned policies.
- They provide a reward-function extraction procedure that can be integrated into existing offline zero-shot RL algorithms with minimal complexity.
- Experiments on multiple benchmarks show the proposed approach improves zero-shot performance by an average of 20% versus prior baselines.


