And now some thoughts on a possible approach to making a deep learning Chess engine (thread)
What I'm about to say is completely speculative and may completely fail, have already been thought of before, or both, but it's an interesting thought I wan to get out
Going over how engines work there are a few misconceptions and a lot of possible mixing and matching which could be done. PUCT looks an awful lot like alpha-beta, and in fact the two could work together by, for example, having a project to generate a Chess opening book by
Having a single machine which is doing PUCT for the book from the starting position and handing out leaf positions to volunteer machines on the internet to evaluate with an A/B engine (or NN, lots of mixing and matching is possible)
Feel free to go ahead and build that idea as a project if you like it. Also notably PUCT isn't what most people think of as MCTS, in that it doesn't necessarily include many deep runouts and can even be implemented completely deterministically
General commentary aside I'd like to draw attention to the Stoofvlees approach, which is that it trains a neural network to make the same moves as humans have in grandmaster games. It's fairly competitive even though it's had much less work put into it than
Leela, which is currently the best neural network engine, so the approach shows promise. The obvious drawback to the approach is that it's tied to reference games, so it requires getting those games, lacks the elegance and adaptability of being able to work things out from
scratch, and is in some sense limited to the quality of play in those games. Here's an idea for how to fix that, which pushes the edges of what it means for an engine to be 'zero' but still qualifies: Start with a database of games, for Chess using a bunch of
human grandmaster games would be a good start (hence the 'not exactly zero' comment). Then run a completely untrained engine some number of moves deep to get board evaluations for every position in every one of those games. These will be very bad evaluations but
at least differentiate won and lost positions. Then train the neural network to match those evaluations directly. Then run the full engine with the new evaluation network on all the positions again to get new evaluations. Then train for a new network, run a few moves deep, etc.
Intuitively what hopefully happens is that the first pass is only a few moves deep, then the next one is a few more moves deep, then a few more than that, etc. This seems to have much more direct training than working off human games, because an eval is a much more specific
piece of information than a move preference, and the step of training the neural network doesn't have to work through a bunch of conditionals, it just has to train on given inputs and outputs.
There is the hazard that the engine might memorize its own self-reinforced evals, but that can be fixed by dividing the set of positions in two and alternating which set the games are played on with each training run.
This approach would also work very well with volunteer machines on the internet, because they can calculate evals and relatively high depths resulting in very little bandwidth needed for the amount of computation accessed.
If anyone is interested in actually building this, or any related idea inspired by this, please do so. Reports of people already having tried it and either succeeding or failing are welcome as well
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
