Full-Gradient Successor Feature Representations
arXiv cs.LG / 4/2/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses instability in standard Successor Features (SF) learning, which typically uses semi-gradient TD updates and lacks strong convergence guarantees with non-linear function approximation, especially under multi-task transfer settings.
- It introduces FG-SFRQL (Full-Gradient Successor Feature Representations Q-Learning), which learns successor features by minimizing the full Mean Squared Bellman Error rather than relying on semi-gradient approximations.
- FG-SFRQL computes gradients with respect to parameters in both the online and target networks, aiming to stabilize and improve the quality of learned feature representations for Generalized Policy Improvement (GPI).
- The authors provide a theoretical proof of almost-sure convergence for FG-SFRQL and report empirical gains in sample efficiency and transfer performance over semi-gradient baselines in both discrete and continuous control domains.
Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse
Dev.to

How To Leverage AI for Back-Office Headcount Optimization
Dev.to
Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.
Reddit r/LocalLLaMA
SOTA Language Models Under 14B?
Reddit r/LocalLLaMA