StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games

arXiv cs.AI / 4/29/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces StratFormer, a transformer-based meta-agent designed to both model opponents and exploit them in imperfect-information games.
It uses a two-phase curriculum: first learning opponent behavioral patterns while following a game-theoretic optimal (GTO) policy, then gradually shifting toward best-response (BR) exploitation with exploitability-aware regularization.
The architecture adds dual-turn tokens and bucket-rate features to capture opponent tendencies at multiple decision moments and across five strategic contexts.
Experiments on Leduc Hold’em against six opponent archetypes show average gains of +0.106 BB per hand over GTO, with peak gains of +0.821 BB against highly exploitable opponents while remaining near-equilibrium safe.
Results indicate the method can improve expected performance against weaker or more predictable opponents without fully sacrificing equilibrium-like stability.

Abstract

We present StratFormer, a transformer-based meta-agent that learns to simultaneously model and exploit opponents in imperfect-information games through a two-phase curriculum. The first phase trains an opponent modeling head to identify behavioral patterns from action histories while the agent plays a game-theoretic optimal (GTO) policy. The second phase progressively shifts the policy toward best-response (BR) exploitation, guided by a per-opponent regularization schedule tied to exploitability. Our architecture introduces dual-turn tokens -- feature vectors constructed at both agent and opponent decision points -- coupled with bucket-rate features that encode opponent tendencies across five strategic contexts. On Leduc Hold'em, a small poker variant with six cards and two betting rounds, we test against six opponent archetypes at two strength levels each, with exploitability ranging from 0.15 to 1.26 Big Blinds (BB) per hand. StratFormer achieves an average exploitation gain of +0.106 BB per hand over GTO, with peak gains of +0.821 against highly exploitable opponents, while maintaining near-equilibrium safety.

What to Build Still Beats How

Dev.to

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

Dev.to

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Dev.to

v0.22.1

Ollama Releases

AI created job descriptions

Reddit r/artificial

StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games

Key Points

Abstract

Related Articles

What to Build Still Beats How

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

v0.22.1

AI created job descriptions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer