Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a new fairness framework for budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF), where individual arm contributions are not directly observable.
  • To quantify arm contributions under this harder feedback setting, the authors extend Shapley values to a K-Shapley value that limits marginal contributions to coalitions of size at most K and prove it is uniquely characterized by key cooperative-game properties.
  • They propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates K-Shapley values when the valuation function is unknown.
  • The method is designed to both learn the valuation function from full feedback and reduce Monte Carlo approximation noise, with a theoretical fairness-regret guarantee of O(T^{3/4}).
  • Experiments on federated learning and social influence maximization datasets show improved fairness and effectiveness compared with existing baselines.

Abstract

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the K-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most K. We show that K-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates K-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves O(T^{3/4}) regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.