Federated Distributional Reinforcement Learning with Distributional Critic Regularization
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper formalizes Federated Distributional Reinforcement Learning (FedDistRL), enabling clients to federate quantile value function critics while preserving distributional information rather than averaging it away.
- It introduces TR-FedDistRL, which uses a per-client, risk-aware Wasserstein barycenter over a temporal buffer to constrain the global critic and maintain distributional details during federation.
- The distributional trust region is implemented as a shrink-squash step around the barycenter reference, ensuring updates stay within a meaningful distributional region.
- Empirical results on bandits, a multi-agent gridworld, and a continuous highway environment show reduced mean-smearing, improved safety proxies, and lower critic/policy drift compared with mean-oriented and non-federated baselines.
Related Articles
How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models
Reddit r/LocalLLaMA
Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)
Dev.to
The Obligor
Dev.to
The Markup
Dev.to
2026 年 AI 部落格變現完整攻略:從第一篇文章到月收入 $1000
Dev.to