SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control
arXiv cs.LG / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces SAVGO, a reinforcement learning method that uses a geometry-aware objective to shape policy updates in continuous action spaces using value-based similarity.
- SAVGO learns a joint state-action embedding space where action pairs with similar action-value estimates are mapped to directions with high cosine similarity, while dissimilar pairs are separated in the embedding geometry.
- Using this learned geometry, the method builds a similarity kernel over candidate actions at each update, steering policy improvement toward higher-value regions beyond what local gradient steps achieve.
- The approach unifies representation learning, value estimation, and policy optimization under a single geometry-consistent objective, while maintaining the scalability benefits of off-policy actor-critic training.
- Experiments on MuJoCo continuous-control benchmarks show improved performance over strong baselines, with ablation studies supporting the contributions of value-geometry learning and similarity-driven policy updates.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge
CLMA Frame Test
Dev.to
You Are Right — You Don't Need CLAUDE.md
Dev.to
Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to