Online combinatorial optimization with stochastic decision sets and adversarial losses
arXiv cs.LG / 4/29/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses sequential learning when the available actions are not fixed, but instead are stochastic “composite actions” whose components may be unavailable due to real-world failures or constraints.
- It introduces learning algorithms derived from the Follow-The-Perturbed-Leader framework, tailored to multiple feedback models including full information, (semi-)bandit, and an intermediate restricted-information setting.
- A key contribution is a new loss estimation method called “Counting Asleep Times,” designed to handle the learner’s partial observability when action availability changes over time.
- The authors provide regret bounds for the studied settings and highlight that their results notably improve the best-known guarantees for an efficient stochastic sleeping-bandit algorithm.
- Empirical evaluations further show that the proposed methods outperform existing approaches for these stochastic-availability problems.
Related Articles

A beginner's guide to the Gemini-2.5-Flash model by Google on Replicate
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Hugging Face 'Spaces' now acts as an MCP-App-Store. Anybody thinking on the security consequence?
Dev.to

AI + Space + APIs: The Future of Web Development 🌌
Dev.to

I Thought AI Would Make Me Lazy. It Made Me More Rigorous.
Dev.to