Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
arXiv cs.LG / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how LLM reasoning in chess improves as training progresses from supervised fine-tuning (SFT) to reinforcement learning (RL) using theoretically inspired datasets.
- It finds that SFT to directly predict the best move can make RL effective and yield strong downstream performance, but that the resulting RL may produce unfaithful reasoning that is inconsistent with the selected move.
- Training on multi-move trajectories achieves similar downstream chess performance while improving “faithful reasoning” and making RL training more stable.
- The authors report that RL shifts the distribution of move quality positively and reduces hallucination rates, and they identify SFT checkpoint metrics (evaluation, hallucinations, reasoning quality) that predict post-RL performance.
- They release checkpoints, final models, training data, evaluations, and code, claiming a 7B-parameter model that surpasses leading open-source reasoning models in chess.
Related Articles

Black Hat Asia
AI Business
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to

Every AI Agent Registry in 2026, Compared
Dev.to