Distributional Value Estimation Without Target Networks for Robust Quality-Diversity
arXiv cs.LG / 4/23/2026
📰 NewsModels & Research
Key Points
- The paper presents QDHUAC, a target-free, distributional reinforcement learning algorithm designed to improve Quality-Diversity (QD) search for complex locomotion tasks.
- Standard high Update-to-Data (UTD) methods often rely on target networks for training stability, but the authors argue this adds a major computational bottleneck that limits practical use in resource-heavy QD settings.
- QDHUAC aims to provide dense, low-variance gradient signals to enable stable training at high UTD ratios and to run Dominated Novelty Search more sample-efficiently.
- Experiments on high-dimensional Brax environments show stable high-UTD training with competitive coverage and fitness, using an order of magnitude fewer environment steps than baseline approaches.
- The authors conclude that pairing target-free distributional critics with dominance-based selection can be a key ingredient for the next generation of sample-efficient evolutionary reinforcement learning algorithms.
Related Articles

Training ChatGPT on Private Data: A Technical Reference
Dev.to
AI as a Fascist Artifact
Dev.to
Sony Ace: el robot que ganó 3 de 5 a élites de ping-pong en Nature
Dev.to

OpenAI releases open-source model that strips personal data from text
THE DECODER

Researchers warn US politics is repeating its ChatGPT mistake with world models
THE DECODER