SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SAVGO, a reinforcement learning method that uses a geometry-aware objective to shape policy updates in continuous action spaces using value-based similarity.
SAVGO learns a joint state-action embedding space where action pairs with similar action-value estimates are mapped to directions with high cosine similarity, while dissimilar pairs are separated in the embedding geometry.
Using this learned geometry, the method builds a similarity kernel over candidate actions at each update, steering policy improvement toward higher-value regions beyond what local gradient steps achieve.
The approach unifies representation learning, value estimation, and policy optimization under a single geometry-consistent objective, while maintaining the scalability benefits of off-policy actor-critic training.
Experiments on MuJoCo continuous-control benchmarks show improved performance over strong baselines, with ablation studies supporting the contributions of value-geometry learning and similarity-driven policy updates.

Abstract

While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are unified within a single geometry-consistent objective, while preserving the scalability of off-policy actor-critic training. The proposed method is evaluated on standard MuJoCo continuous-control benchmarks, demonstrating improvements over strong baselines on challenging high-dimensional tasks. Ablation studies are done to analyze the contributions of value-geometry learning and similarity-based policy updates.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

You Are Right — You Don't Need CLAUDE.md

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

You Are Right — You Don't Need CLAUDE.md

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer