Full-Gradient Successor Feature Representations

arXiv cs.LG / 4/2/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses instability in standard Successor Features (SF) learning, which typically uses semi-gradient TD updates and lacks strong convergence guarantees with non-linear function approximation, especially under multi-task transfer settings.
It introduces FG-SFRQL (Full-Gradient Successor Feature Representations Q-Learning), which learns successor features by minimizing the full Mean Squared Bellman Error rather than relying on semi-gradient approximations.
FG-SFRQL computes gradients with respect to parameters in both the online and target networks, aiming to stabilize and improve the quality of learned feature representations for Generalized Policy Improvement (GPI).
The authors provide a theoretical proof of almost-sure convergence for FG-SFRQL and report empirical gains in sample efficiency and transfer performance over semi-gradient baselines in both discrete and continuous control domains.

Abstract

Successor Features (SF) combined with Generalized Policy Improvement (GPI) provide a robust framework for transfer learning in Reinforcement Learning (RL) by decoupling environment dynamics from reward functions. However, standard SF learning methods typically rely on semi-gradient Temporal Difference (TD) updates. When combined with non-linear function approximation, semi-gradient methods lack robust convergence guarantees and can lead to instability, particularly in the multi-task setting where accurate feature estimation is critical for effective GPI. Inspired by Full Gradient DQN, we propose Full-Gradient Successor Feature Representations Q-Learning (FG-SFRQL), an algorithm that optimizes the successor features by minimizing the full Mean Squared Bellman Error. Unlike standard approaches, our method computes gradients with respect to parameters in both the online and target networks. We provide a theoretical proof of almost-sure convergence for FG-SFRQL and demonstrate empirically that minimizing the full residual leads to superior sample efficiency and transfer performance compared to semi-gradient baselines in both discrete and continuous domains.

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

Dev.to

How To Leverage AI for Back-Office Headcount Optimization

Dev.to

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Reddit r/LocalLLaMA

SOTA Language Models Under 14B?

Reddit r/LocalLLaMA

Full-Gradient Successor Feature Representations

Key Points

Abstract

Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

How To Leverage AI for Back-Office Headcount Optimization

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

SOTA Language Models Under 14B?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer