COMPASS-Hedge: Learning Safely Without Knowing the World

arXiv cs.LG / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces COMPASS-Hedge, a new full-information online learning algorithm designed to resolve a common “trilemma” around adversarial regret, stochastic efficiency, and baseline safety relative to a fixed comparator.
COMPASS-Hedge is claimed to achieve minimax-optimal regret in adversarial settings, instance-/gap-dependent (instance-optimal) regret in stochastic settings, and only o(1) (log-factor-adjusted) regret versus a designated baseline policy.
The method is described as parameter-free, requiring no prior knowledge of whether the environment is adversarial or stochastic and no access to problem-dependent gap magnitudes.
The algorithm’s design combines adaptive pseudo-regret scaling with phase-based “aggression,” plus a comparator-aware mixing strategy to unify the three performance guarantees.

Abstract

Online learning algorithms often faces a fundamental trilemma: balancing regret guarantees between adversarial and stochastic settings and providing baseline safety against a fixed comparator. While existing methods excel in one or two of these regimes, they typically fail to unify all three without sacrificing optimal rates or requiring oracle access to problem-dependent parameters. In this work, we bridge this gap by introducing COMPASS-Hedge. Our algorithm is the first full-information method to simultaneously achieve: i) Minimax-optimal regret in adversarial environments; ii) Instance-optimal, gap-dependent regret in stochastic environments; and iii)

\tilde{\mathcal{O}}(1)

regret relative to a designated baseline policy, up to logarithmic factors. Crucially, COMPASS-Hedge is parameter-free and requires no prior knowledge of the environment's nature or the magnitude of the stochastic sub optimality gaps. Our approach hinges on a novel integration of adaptive pseudo-regret scaling and phase-based aggression, coupled with a comparator-aware mixing strategy. To the best of our knowledge, this provides the first "best-of-three-world" guarantee in the full-information setting, establishing that baseline safety does not have to come at the cost of worst-case robustness or stochastic efficiency.

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

Dev.to

AI Shields Your Money: Banks’ New Fraud Fighters

Dev.to

Building AI Phone Systems for Veterinary Clinics — What Actually Works

Dev.to

How to Use Instagram Reels to Boost Sales [2026 Strategy]

Dev.to

[R] Adversarial Machine Learning

Reddit r/MachineLearning

COMPASS-Hedge: Learning Safely Without Knowing the World

Key Points

Abstract

Related Articles

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

AI Shields Your Money: Banks’ New Fraud Fighters

Building AI Phone Systems for Veterinary Clinics — What Actually Works

How to Use Instagram Reels to Boost Sales [2026 Strategy]

[R] Adversarial Machine Learning

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer