Abstract
In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size \eta in n\times m two-player zero-sum games played over T rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order \Omega(N/\eta), proving that exploitation scales to the number of the learner's suboptimal actions N and vanishes in their absence. Second, for alternating optimizer, a surplus of \Omega(\eta T/\mathrm{poly}(n,m)) can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.