On the Exploitability of FTRL Dynamics

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes how exploitable a Follow-the-Regularized-Leader (FTRL) learner with constant step size can be in two-player zero-sum games against a clairvoyant optimizer across T rounds.
It argues exploitability is intrinsic to the FTRL algorithm family, not due to particular regularizer or implementation choices.
For a fixed optimizer, the authors prove a sweeping lower bound of order Ω(N/η), where exploitation grows with the number of the learner’s suboptimal actions N and disappears when those actions are absent.
For an alternating (randomized) optimizer, they show a guaranteed surplus of order Ω(ηT/poly(n,m)) with high probability in random games, regardless of equilibrium structure.
The study finds a geometric dichotomy based on regularizer steepness: non-steep regularizers enable fast finite-time elimination of suboptimal actions (high leverage), while steep regularizers make the exploitation correction smaller and potentially slower; it also proposes a susceptibility metric to compare regularizers under payoff uncertainty.

Abstract

In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size

\eta

n\times m

two-player zero-sum games played over

T

rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order

\Omega(N/\eta)

, proving that exploitation scales to the number of the learner's suboptimal actions

N

and vanishes in their absence. Second, for alternating optimizer, a surplus of

\Omega(\eta T/\mathrm{poly}(n,m))

can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Google isn’t an AI-first company despite Gemini being great

Reddit r/artificial

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

Dev.to

On the Exploitability of FTRL Dynamics

Key Points

Abstract

Related Articles

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Context Windows Are Getting Absurd — And That's a Good Thing

Google isn’t an AI-first company despite Gemini being great

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer