Population-Aware Imitation Learning in Mean-field Games with Common Noise

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies imitation learning in Mean Field Games (MFGs) where agents face common noise and the population distribution evolves stochastically.
  • It argues that this stochasticity requires population-aware policies that react to aggregate (population-level) shocks.
  • The authors define two learning objectives—(1) recovering a Nash equilibrium and (2) matching/exceeding an expert population—and evaluate two imitation surrogates: Behavioral Cloning (BC) and adversarial (ADV) divergence.
  • They provide finite-sample error bounds showing that minimizing these imitation proxies controls both policy exploitability and performance gaps versus the expert.
  • A numerical method combining generalized Fictitious Play with deep learning is proposed, and experiments across three environments show that population-unaware policies cannot capture equilibrium dynamics under common noise.

Abstract

Mean Field Games (MFGs) provide a powerful framework for modeling the collective behavior of large populations of interacting agents. In this paper, we address the problem of Imitation Learning (IL) in MFGs subject to common noise, where the population distribution evolves stochastically. This stochasticity compels agents to adopt population-aware policies to respond to aggregate shocks. We formulate two distinct learning objectives: recovering a Nash equilibrium and maximizing performance against an expert population. We investigate two imitation proxies: Behavioral Cloning (BC) and Adversarial (ADV) divergence. We then establish finite-sample error bounds showing that minimizing these proxies effectively controls both the policy's exploitability and its performance gap relative to the expert. Furthermore, we propose a numerical framework using generalized Fictitious Play and Deep Learning to compute expert population-aware policies. Through experiments on three environments we demonstrate that standard population-unaware policies fail to capture the equilibrium dynamics. Our results highlight that learning population-aware policies is crucial to avoid being misled by the randomness inherent in common noise.