Tracing the Thought of a Grandmaster-level Chess-Playing Transformer

arXiv cs.LG / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a sparse decomposition interpretability framework for Leela Chess Zero (LC0), aiming to reveal how its transformer modules compute chess reasoning internally.
  • It decomposes both the MLP and attention components using sparse replacement layers to capture the dominant computation pathways.
  • Through a detailed case study, the authors show the resulting pathways correspond to rich, interpretable tactical considerations that can be empirically verified.
  • The work introduces three quantitative metrics and argues LC0 exhibits parallel reasoning behavior aligned with the inductive bias of its policy head architecture.
  • The authors claim this is the first approach to decompose a transformer’s internal computation across both MLP and attention modules for interpretability, and they provide code publicly.

Abstract

While modern transformer neural networks achieve grandmaster-level performance in chess and other reasoning tasks, their internal computation process remains largely opaque. Focusing on Leela Chess Zero (LC0), we introduce a sparse decomposition framework to interpret its internal computation by decomposing its MLP and attention modules with sparse replacement layers, which capture the primary computation process of LC0. We conduct a detailed case study showing that these pathways expose rich, interpretable tactical considerations that are empirically verifiable. We further introduce three quantitative metrics and show that LC0 exhibits parallel reasoning behavior consistent with the inductive bias of its policy head architecture. To the best of our knowledge, this is the first work to decompose the internal computation of a transformer on both MLP and attention modules for interpretability. Combining sparse replacement layers and causal interventions in LC0 provides a comprehensive understanding of advanced tactical reasoning, offering critical insights into the underlying mechanisms of superhuman systems. Our code is available at https://github.com/JacklE0niden/Leela-SAEs.