Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits

arXiv cs.LG / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies incentive design among multiple content creators in recommendation systems by modeling user feedback as a multi-agent stochastic linear bandit with transferable utility (TU) cooperative game structure.
  • For homogeneous agents with fixed action sets, the authors show the resulting TU game is convex under mild conditions, guaranteeing a non-empty core that includes the Shapley value to deliver stability and fairness.
  • For heterogeneous agents, the game still has a non-empty core, but convexity and Shapley value core-membership are no longer assured, motivating alternative payout mechanisms.
  • The authors introduce a regret-based payout rule that lies in the core and satisfies three of four Shapley axioms, aiming to achieve fairer collaboration under more general settings.
  • Experiments on MovieLens-100k analyze when empirical payouts match Shapley-based fairness and when they diverge across different environments and learning algorithms.

Abstract

User interactions in online recommendation platforms create interdependencies among content creators: feedback on one creator's content influences the system's learning and, in turn, the exposure of other creators' contents. To analyze incentives in such settings, we model collaboration as a multi-agent stochastic linear bandit problem with a transferable utility (TU) cooperative game formulation, where a coalition's value equals the negative sum of its members' cumulative regrets. We show that, for identical (homogenous) agents with fixed action sets, the induced TU game is convex under mild algorithmic conditions, implying a non-empty core that contains the Shapley value and ensures both stability and fairness. For heterogeneous agents, the game still admits a non-empty core, though convexity and Shapley value core-membership are no longer guaranteed. To address this, we propose a simple regret-based payout rule that satisfies three out of the four Shapley axioms and also lies in the core. Experiments on MovieLens-100k dataset illustrate when the empirical payout aligns with -- and diverges from -- the Shapley fairness across different settings and algorithms.