Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning

arXiv cs.CV / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies why successor representation (SR) approaches perform poorly in visual zero-shot unsupervised reinforcement learning, highlighting two main issues: attention to dynamics-irrelevant regions and degraded skill controllability under flawed successor measures.
  • It proposes a new framework called Saliency-Guided Representation with Consistency Policy Learning (SRCP) that decouples representation learning from successor training to better capture dynamics-relevant features.
  • SRCP introduces a saliency-guided dynamics task to improve successor measures and task generalization, addressing the representation failures of SR in high-dimensional visual settings.
  • The framework also improves skill-conditioned action modeling by combining fast-sampling consistency policy learning with URL-specific classifier-free guidance and tailored training objectives.
  • Experiments across 16 tasks on 4 datasets from the ExORL benchmark show SRCP delivers state-of-the-art zero-shot generalization and can be used alongside multiple SR methods.

Abstract

Zero-shot unsupervised reinforcement learning (URL) offers a promising direction for building generalist agents capable of generalizing to unseen tasks without additional supervision. Among existing approaches, successor representations (SR) have emerged as a prominent paradigm due to their effectiveness in structured, low-dimensional settings. However, SR methods struggle to scale to high-dimensional visual environments. Through empirical analysis, we identify two key limitations of SR in visual URL: (1) SR objectives often lead to suboptimal representations that attend to dynamics-irrelevant regions, resulting in inaccurate successor measures and degraded task generalization; and (2) these flawed representations hinder SR policies from modeling multi-modal skill-conditioned action distributions and ensuring skill controllability. To address these limitations, we propose Saliency-Guided Representation with Consistency Policy Learning (SRCP), a novel framework that improves zero-shot generalization of SR methods in visual URL. SRCP decouples representation learning from successor training by introducing a saliency-guided dynamics task to capture dynamics-relevant representations, thereby improving successor measure and task generalization. Moreover, it integrates a fast-sampling consistency policy with URL-specific classifier-free guidance and tailored training objectives to improve skill-conditioned policy modeling and controllability. Extensive experiments on 16 tasks across 4 datasets from the ExORL benchmark demonstrate that SRCP achieves state-of-the-art zero-shot generalization in visual URL and is compatible with various SR methods.