Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
arXiv cs.AI / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces STRATAGEM, a new self-play approach aimed at improving language models’ transferable reasoning rather than overfitting to game-specific heuristics.
- It tackles two transfer barriers—domain specificity (reasoning stuck to game semantics) and contextual stasis (failure to develop improved reasoning over time).
- STRATAGEM selectively reinforces self-play trajectories using a Reasoning Transferability Coefficient to favor abstract, domain-agnostic reasoning patterns.
- It also incentivizes progressive improvement with a Reasoning Evolution Reward, encouraging adaptive reasoning development rather than static context learning.
- Experiments across math, general reasoning, and code generation show significant gains, especially on competition-level math, and ablations plus human evaluation indicate both components are essential for transfer.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

DEEPX and Hyundai Are Building Generative AI Robots
Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline
Dev.to