On the Exploitability of FTRL Dynamics

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes how exploitable a Follow-the-Regularized-Leader (FTRL) learner with constant step size can be in two-player zero-sum games against a clairvoyant optimizer across T rounds.
  • It argues exploitability is intrinsic to the FTRL algorithm family, not due to particular regularizer or implementation choices.
  • For a fixed optimizer, the authors prove a sweeping lower bound of order Ω(N/η), where exploitation grows with the number of the learner’s suboptimal actions N and disappears when those actions are absent.
  • For an alternating (randomized) optimizer, they show a guaranteed surplus of order Ω(ηT/poly(n,m)) with high probability in random games, regardless of equilibrium structure.
  • The study finds a geometric dichotomy based on regularizer steepness: non-steep regularizers enable fast finite-time elimination of suboptimal actions (high leverage), while steep regularizers make the exploitation correction smaller and potentially slower; it also proposes a susceptibility metric to compare regularizers under payoff uncertainty.

Abstract

In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size \eta in n\times m two-player zero-sum games played over T rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order \Omega(N/\eta), proving that exploitation scales to the number of the learner's suboptimal actions N and vanishes in their absence. Second, for alternating optimizer, a surplus of \Omega(\eta T/\mathrm{poly}(n,m)) can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.

On the Exploitability of FTRL Dynamics | AI Navigate