StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
arXiv cs.AI / 4/29/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces StratFormer, a transformer-based meta-agent designed to both model opponents and exploit them in imperfect-information games.
- It uses a two-phase curriculum: first learning opponent behavioral patterns while following a game-theoretic optimal (GTO) policy, then gradually shifting toward best-response (BR) exploitation with exploitability-aware regularization.
- The architecture adds dual-turn tokens and bucket-rate features to capture opponent tendencies at multiple decision moments and across five strategic contexts.
- Experiments on Leduc Hold’em against six opponent archetypes show average gains of +0.106 BB per hand over GTO, with peak gains of +0.821 BB against highly exploitable opponents while remaining near-equilibrium safe.
- Results indicate the method can improve expected performance against weaker or more predictable opponents without fully sacrificing equilibrium-like stability.


