Unified Policy Value Decomposition for Rapid Adaptation
arXiv cs.LG / 3/19/2026
📰 NewsModels & Research
Key Points
- Introduces a bilinear actor-critic decomposition where policy and value share a low-dimensional goal embedding for rapid adaptation to new tasks without retraining.
- The critic factorizes as Q = sum_k G_k(g) y_k(s,a), with G_k(g) the goal-conditioned coefficients and y_k(s,a) the learned value bases, implementing a gain-modulated, multiplicative interaction.
- The actor is extended to weight primitive policies by the same coefficients, enabling zero-shot adaptation by freezing the bases and estimating G_k(g) in a single forward pass.
- Experiments on MuJoCo Ant with eight-direction locomotion show improved rapid adaptation and suggest a biologically plausible mechanism for efficient transfer in high-dimensional RL.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA
Qwen3.5 Knowledge density and performance
Reddit r/LocalLLaMA
I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)
Reddit r/LocalLLaMA