Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arXiv cs.LG / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies linear dueling bandits under multiple simultaneous stressors, including post-serving contexts, unknown delayed feedback, and adversarial corruption with a total budget C.
It introduces an algorithm that learns to approximate post-serving contexts from pre-serving information and uses an adaptive, clipping-based weighting scheme to reduce the combined harm from corrupted and delayed observations.
The authors prove a delay-regime-agnostic regret bound of ~O(d(√T + C + D)), where D captures delay complexity, under standard regularity assumptions and a parametric mapping for the post-serving context.
A key theoretical contribution is showing an additive (rather than multiplicative) interaction between the costs of corruption and delay, improving upon degradation patterns in earlier approaches.
The work also provides near-matching lower bounds, indicating that their upper bounds are essentially tight for adversarial delays when post-serving contexts are not present (up to a √d factor).

Abstract

We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget

\mathcal{C}

. To address these challenges, we propose \term, which integrates a learned approximator that predicts post-serving contexts from pre-serving information. It further employs an adaptive weighting strategy that clips feature vectors to mitigate the impact of corrupted and delayed observations simultaneously. Under standard regularity conditions and a parametric post-serving mapping, we rigorously establish that our algorithm is delay-regime-agnostic, achieving a regret upper bound of

\widetilde{\mathcal{O}}(d(\sqrt{T} + \mathcal{C} + \mathcal{D}))

, where

d

is the total feature dimension and

\mathcal{D}

encapsulates the delay complexity. Crucially, our analysis reveals an additive cost structure between corruption and delay, avoiding the multiplicative degradation typical of prior works. We further establish lower bounds that nearly match our upper bounds up to a

\sqrt{d}

factor for adversarial delays in the absence of post-serving contexts.

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Last Week in AI

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Reddit r/LocalLLaMA

Uber Shares What Happens When 1.500 AI Agents Hit Production

Reddit r/artificial

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

Reddit r/LocalLLaMA

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Uber Shares What Happens When 1.500 AI Agents Hit Production

vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer