Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

arXiv cs.LG / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper studies linear dueling bandits under multiple simultaneous stressors, including post-serving contexts, unknown delayed feedback, and adversarial corruption with a total budget C.
  • It introduces an algorithm that learns to approximate post-serving contexts from pre-serving information and uses an adaptive, clipping-based weighting scheme to reduce the combined harm from corrupted and delayed observations.
  • The authors prove a delay-regime-agnostic regret bound of ~O(d(√T + C + D)), where D captures delay complexity, under standard regularity assumptions and a parametric mapping for the post-serving context.
  • A key theoretical contribution is showing an additive (rather than multiplicative) interaction between the costs of corruption and delay, improving upon degradation patterns in earlier approaches.
  • The work also provides near-matching lower bounds, indicating that their upper bounds are essentially tight for adversarial delays when post-serving contexts are not present (up to a √d factor).

Abstract

We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget \mathcal{C}. To address these challenges, we propose \term, which integrates a learned approximator that predicts post-serving contexts from pre-serving information. It further employs an adaptive weighting strategy that clips feature vectors to mitigate the impact of corrupted and delayed observations simultaneously. Under standard regularity conditions and a parametric post-serving mapping, we rigorously establish that our algorithm is delay-regime-agnostic, achieving a regret upper bound of \widetilde{\mathcal{O}}(d(\sqrt{T} + \mathcal{C} + \mathcal{D})), where d is the total feature dimension and \mathcal{D} encapsulates the delay complexity. Crucially, our analysis reveals an additive cost structure between corruption and delay, avoiding the multiplicative degradation typical of prior works. We further establish lower bounds that nearly match our upper bounds up to a \sqrt{d} factor for adversarial delays in the absence of post-serving contexts.