Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers
arXiv cs.AI / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that “searchless” chess transformers trained only on move sequences must learn two distinct but conflicting capabilities: state tracking from move history and decision quality for choosing good moves.
- It formalizes this as a dual-capability bottleneck (performance limited by the weaker of tracking or decision learning), explaining why low-rated games help tracking diversity while high-rated games provide better decision signals, and removing low-rated data harms results.
- The authors scale the model from 28M to 120M parameters to improve tracking, then use Elo-weighted training to boost decision quality while preserving diversity, finding that the two interventions combine superadditively.
- Their experiments show scaling improves tracking, weighting improves decisions, and linear weighting is best; overly aggressive weighting can damage tracking even if validation loss decreases.
- The 120M-parameter model (no search) reaches Lichess Bullet ~2570 and achieves 55.2% Top-1 accuracy on human move prediction, while sequence-based input enables history-dependent behavior that position-only methods lack.
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to