I built Autochess NN, a browser-playable neural chess engine that started as a personal experiment in understanding AlphaZero-style systems by actually building one end to end.
This project was unapologetically vibecoded - but not in the “thin wrapper around an API” sense. I used AI heavily as a research/coding assistant in a Karpathy-inspired autoresearch workflow: read papers, inspect ideas, prototype, ablate, optimize, repeat. The interesting part for me was seeing how far that loop could go on home hardware (just ordinary gaming RTX 4090).
Current public V3:
- residual CNN + transformer
- learned thought tokens
- ~16M parameters
- 19-plane 8x8 input
- 4672-move policy head + value head
- trained on 100M+ positions
- pipeline: 2200+ Lichess supervised pretraining -> Syzygy endgame fine-tuning -> self-play RL with search distillation
- CPU inference + shallow 1-ply lookahead / quiescence (below 2ms)
I also wrapped it in a browser app so the model is inspectable, not just benchmarked: play vs AI, board editor, PGN import/replay, puzzles, and move analysis showing top-move probabilities and how the “thinking” step shifts them.
What surprised me is that, after a lot of optimization, this may have ended up being unusually compute-efficient for its strength - possibly one of the more efficient hobbyist neural chess engines above 2500 Elo. I’m saying that as a hypothesis to pressure-test, not as a marketing claim, and I’d genuinely welcome criticism on evaluation methodology.
I’m now working on V4 with a different architecture:
- CNN + Transformer + Thought Tokens + DAB (Dynamic Attention Bias) @ 50M parameters
For V5, I want to test something more speculative that I’m calling Temporal Look-Ahead: the network internally represents future moves and propagates that information backward through attention to inform the current decision.
Demo: https://games.jesion.pl
Project details: https://games.jesion.pl/about
Price: free browser demo. Nickname/email are only needed if you want to appear on the public leaderboard.
- The feedback I’d value most:
- Best ablation setup for thought tokens / DAB
- Better methodology for measuring Elo-vs-compute efficiency on home hardware
- Whether the Temporal Look-Ahead framing sounds genuinely useful or just fancy rebranding of something already known
- Ideas for stronger evaluation against classical engines without overclaiming
Cheers, Adam
[link] [comments]