Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning
arXiv cs.CV / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies why successor representation (SR) approaches perform poorly in visual zero-shot unsupervised reinforcement learning, highlighting two main issues: attention to dynamics-irrelevant regions and degraded skill controllability under flawed successor measures.
- It proposes a new framework called Saliency-Guided Representation with Consistency Policy Learning (SRCP) that decouples representation learning from successor training to better capture dynamics-relevant features.
- SRCP introduces a saliency-guided dynamics task to improve successor measures and task generalization, addressing the representation failures of SR in high-dimensional visual settings.
- The framework also improves skill-conditioned action modeling by combining fast-sampling consistency policy learning with URL-specific classifier-free guidance and tailored training objectives.
- Experiments across 16 tasks on 4 datasets from the ExORL benchmark show SRCP delivers state-of-the-art zero-shot generalization and can be used alongside multiple SR methods.

