Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues

arXiv cs.CV / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes RLVC, a reinforcement-learning framework that uses outcome-based rewards and class-wise visual cues to improve generative zero-shot learning (ZSL) beyond task-agnostic synthesized features.
  • RLVC “self-evolves” the generative model by updating it with rewards that encourage task-relevant feature synthesis, addressing cases where semantic prototypes alone cannot capture visual distinctions.
  • The method introduces visual cues to align synthesized features with visual prototypes and to stabilize the reinforcement learning training updates.
  • A novel cold-start training strategy is presented for RLVC’s training process.
  • Experiments on three common ZSL benchmarks report state-of-the-art performance with a 4.7% improvement over prior results.

Abstract

Recent advances in zero-shot learning (ZSL) have demonstrated the potential of generative models. Typically, generative ZSL synthesizes visual features conditioned on semantic prototypes to model the data distribution of unseen classes, followed by training a classifier on the synthesized data. However, the synthesized features often remain task-agnostic, leading to degraded performance. Moreover, inferring a faithful distribution from semantic prototypes alone is insufficient for classes that are semantically similar but visually distinct. To address these and advance ZSL, we propose RLVC, an outcome-reward reinforcement learning RL framework with visual cues for generative ZSL. At its core, RL empowers the generative model to self-evolve, implicitly enhancing its generation capability. In particular, RLVC updates the generative model using an outcome-based reward, encouraging the synthesis of task-relevant features. Furthermore, we introduce class-wise visual cues that (i) align synthesized features with visual prototypes and (ii) stabilize the RL training updates. For the training process, we present a novel cold-start strategy. Comprehensive experiments and analyses on three prevalent ZSL benchmarks demonstrate that RLVC achieves state-of-the-art results with a 4.7% gain.