Online learning with Erd\H{o}s-R\'enyi side-observation graphs

arXiv stat.ML / 4/29/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies adversarial multi-armed bandit learning where the learner can sometimes observe the losses of non-chosen arms via side-observation graphs.
It assumes each non-selected arm reveals its loss independently with an unknown fixed probability r, and proposes two algorithms tailored to different ranges of r.
For the case r ≥ (log T)/(2N), the first algorithm attains expected regret O(√((T/r) log N)) after T rounds with N arms.
For smaller r, the second algorithm improves the bound to O(√((T/r) log (N+T))) and the authors also provide a procedure to estimate which r-regime applies.
The regret bounds are shown to match (up to logarithmic factors) the best performance achievable even by algorithms that are allowed to know r in advance.

Abstract

We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with a fixed but unknown probability

r

, independently of each other and the action of the learner. We propose two algorithms that work for different ranges of

r

. We show that after

T

rounds in a bandit problem with

N

arms, the expected regret of our first algorithm is

O(\sqrt{(T /r) \log N })

whenever

r\ge(\log T)/(2N)

, while our second algorithm achieves a regret of

O(\sqrt{(T/r) \log (N+T)})

for smaller values of

r

. We also give a quick estimation procedure that decides the range of~

r

. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know~

r

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

Reddit r/MachineLearning

Online learning with Erd\H{o}s-R\'enyi side-observation graphs

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Automatic Error Recovery in AI Agent Networks

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer