Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key challenge in cross-silo federated learning (CFL) where participants must cooperate during training but compete in downstream markets, creating incentives that can inadvertently help rivals.
  • Existing competition-aware incentive methods reward marginal contributions but do not properly account for the utility loss to participants caused by strengthening competitors, especially under non-IID data that yields uneven learning gains.
  • It proposes CoCoGen+, a framework that jointly models non-IID data and inter-organizational competition while treating GenAI-based synthetic data generation as an endogenous strategic decision.
  • Each training round is formulated as a weighted potential game, balancing expected learning improvements against computational costs and competition-driven utility losses, and deriving implementable generation strategies to maximize social welfare.
  • The approach adds a payoff redistribution incentive to sustain long-term collaboration, and experiments show CoCoGen+ improves efficiency over baselines across different learning tasks.

Abstract

In data-sensitive domains such as healthcare, cross-silo federated learning (CFL) allows organizations to collaboratively train AI models without sharing raw data. However, practical CFL deployments are inherently coopetitive, in which organizations cooperate during model training while competing in downstream markets. In such settings, training contributions, including data volume, quality, and diversity, can improve the global model yet inadvertently strengthen rivals. This dilemma is amplified by non-IID data, which leads to asymmetric learning gains and undermines sustained participation. While existing competition-aware CFL and incentive-design approaches reward organizations based on marginal training contributions, they fail to account for the costs of strengthening competitors. In this paper, we introduce CoCoGen+, a coopetition-compatible data generation and incentivization framework that jointly models non-IID data and inter-organizational competition while endogenizing GenAI-based synthetic data generation as a strategic decision. Specifically, CoCoGen+ formulates each training round as a weighted potential game, where organizations strategically decide how much synthetic data to generate by balancing learning performance gains against computational costs and competition-caused utility losses. We then provide a tractable equilibrium characterization and derive implementable generation strategies to maximize social welfare. To promote long-term collaboration, we integrate a payoff redistribution-based incentive mechanism to compensate organizations for their contributions and competition-caused utility degradation. Experiments on varying learning tasks validate the feasibility of CoCoGen+. The results show how non-IID data, competition intensity, and incentives shape organizational strategies and social welfare, while CoCoGen+ outperforms baselines in efficiency.