AlphaZero defeats Stockfish: Quick thoughts

The big news of the day, as far as I’m concerned, is the victory of Google Deepmind’s AlphaZero over Stockfish, currently the highest rated chess engine. This comes barely months after Deepmind’s AlphaGo Zero had bested the earlier avatar of AlphaGo in the game of Go.

Like its Go version, the AlphaZero chess playing machine learnt using reinforcement learning (I remember doing a term paper on the concept back in 2003 but have mostly forgotten). Basically it wasn’t given any “training data”, but the machine trained itself on continuously playing with itself, with feedback given in each stage of learning helping it learn better.

After only about four hours of “training” (basically playing against itself and discovering moves), AlphaZero managed to record this victory in a 100-game match, winning 28 and losing none (the rest of the games were drawn).

There’s a sample game here on the Chess.com website and while this might be a biased sample (it’s likely that the AlphaZero engineers included the most spectacular games in their paper, from which this is taken), the way AlphaZero plays is vastly different from the way engines such as Stockfish have been playing.

I’m not that much of a chess expert (I “retired” from my playing career back in 1994), but the striking things for me from this game were

  • the move 7. d5 against the Queen’s Indian
  • The piece sacrifice a few moves later that was hard to see
  • AlphaZero’s consistent attempts until late in the game to avoid trading queens
  • The move Qh1 somewhere in the middle of the game

In a way (and being consistent with some of the themes of this blog), AlphaZero can be described as a “stud” chess machine, having taught itself to play based on feedback from games it’s already played (the way reinforcement learning broadly works is that actions that led to “good rewards” are incentivised in the next iteration, while those that led to “poor rewards” are penalised. The challenge in this case is to set up chess in a way that is conducive for a reinforcement learning system).

Engines such as StockFish, on the other hand, are absolute “fighters”. They get their “power” by brute force, by going down nearly all possible paths in the game several moves down. This is supplemented by analysis of millions of existing games of various levels which the engine “learns” from – among other things, it learns how to prune and prioritise the paths it searches on. StockFish is also fed a database of chess openings which it remembers and tries to play.

What is interesting is that AlphaZero has “discovered” some popular chess openings through the course of is self-learning. It is interesting to note that some popular openings such as the King’s Indian or French find little favour with this engine, while others such as the Queen’s Gambit or the Queen’s Indian find favour. This is a very interesting development in terms of opening theory itself.

Frequency of openings over time employed by AlphaZero in its “learning” phase. Image sourced from AlphaZero research paper.

In any case, my immediate concern from this development is how it will affect human chess. Over the last decade or two, engines such as stockfish have played a profound role in the development of chess, with current top players such as Magnus Carlsen or Sergey Karjakin having trained extensively with these engines.

The way top grandmasters play has seen a steady change in these years as they have ingested the ideas from engines such as StockFish. The game has become far more quiet and positional, as players seek to gain small advantages which steadily improves over the course of (long) games. This is consistent with the way the engines that players learn from play.

Based on the evidence of the one game I’ve seen of AlphaZero, it plays very differently from the existing engines. Based on this, it will be interesting to see how human players who train with AlphaZero based engines (or their clones) will change their game.

Maybe chess will turn back to being a bit more tactical than it’s been in the last decade? It’s hard to say right now!

Hill Climbing in real life

Fifteen years back, I enrolled for a course on Artificial Intelligence as part of my B.Tech. programme at IIT Madras. It was well before stuff like “machine learning” and “data science” became big, and the course was mostly devoted to heuristics. Incidentally, that term, we had to pick between this course and one on Artificial Neural Networks (I guess nowadays that one is more popular given the hype about Deep Learning?), which meant that I didn’t learn about neural networks until last year or so.

A little googling tells me that Deepak Khemani, who taught us AI in 2002, has put up his lectures online, as part of the NPTEL programme. The first one is here:

In fact, the whole course is available here.

Anyways, one of the classes of problems we dealt with in the course was “search”. Basically, how does a computer “search” for the solution to a problem within a large “search space”?

One of the simplest heuristic is what has come to be known as “hill climbing” (too lazy to look through all of Khemani’s lectures and find where he’s spoken about this). I love computer science because a lot of computer scientists like to describe ideas in terms of intuitive metaphors. Hill climbing is definitely one such!

Let me explain it from the point of view of my weekend vacation in Edinburgh. One of my friends who had lived there a long time back recommended that I hike up this volcanic hill in the city called “Arthur’s Peak“.

On Saturday evening, I left my wife and daughter and wife’s parents (who I had travelled with) in our AirBnB and walked across town (some 3-4 km) to reach Holyrood Palace, from where Arthur’s Seat became visible. This is what I saw: 

Basically, what you see is the side of a hill, and if you see closely, there are people walking up the sides. So what you guess is that you need to make your way to the bottom of the hill and then just climb.

But then you make your way to the base of the hill and see several paths leading up. Which one do you take? You take the path that seems steepest, believing that’s the one that will take you to the top quickest. And so you take a step along that path. And then see which direction to go to climb up steepest. Take another step. Rinse. Repeat. Until you reach a point where you can no longer find a way up. Hopefully that’s the peak.

Most of the time, you are likely to end up on the top of a smaller rock. In any case, this is the hill climbing algorithm.

So back to my story. I reached the base of the hill and set off on the steepest marked path.

I puffed and panted, but I kept going. It was rather windy that day, and it was threatening to rain. I held my folded umbrella and camera tight, and went on. I got beautiful views of Edinburgh city, and captured some of them on camera. And after a while, I got tired, and decided to call my wife using Facetime.

In any case, it appeared that I had a long way to go, given the rocks that went upwards just to my left (I was using a modified version of hill climbing in that I used only marked paths. As I was to rediscover the following day, I have a fear of heights). And I told that to my wife. And then suddenly the climb got easier. And before I knew it I was descending. And soon enough I was at the bottom all over again!

And then I saw the peak. Basically what I had been climbing all along was not the main hill at all! It was a “side hill”, which I later learnt is called the “Salisbury Crags”. I got down to the middle of the two hills, and stared at the valley there. I realised that was a “saddle point”, and hungry and tired and not wanting to get soaked in rain, I made my way out, hailed a cab and went home.

I wasn’t done yet. Determined to climb the “real peak”, I returned the next morning. Again I walked all the way to the base of the hill, and started my climb at the saddle point. It was a tough climb – while there were rough steps in some places, in others there was none. I kept climbing a few steps at a time, taking short breaks.

One such break happened to be too long, though, and gave me enough time to look down and feel scared. For a long time now I’ve had a massive fear of heights. Panic hit. I was afraid of going too close to the edge and falling off the hill. I decided to play it safe and turn back.

I came down and walked across the valley you see in the last picture above. Energised, I had another go. From what was possibly a relatively easier direction. But I was too tired. And I had to get back to the apartment and check out that morning. So I gave up once again.

I still have unfinished business in Edinburgh!

 

Genetic Algorithms

I first learnt about Genetic Algorithms almost exactly thirteen years ago, when it was taught to me by Prof. Deepak Khemani as part of a course on “artificial intelligence”. I remember liking the course a fair bit, and took a liking to the heuristics and “approximate solutions” after the mathematically intensive algorithms course of the previous semester.

The problem with the course, however, was that it didn’t require us to code the algorithms we had learnt (for which we were extremely thankful back then, since in term 5 of Computer Science at IIT Madras, this was pretty much the only course that didn’t involve too many programming assignments).

As a result, while I had learnt all these heuristics (hill climbing, simulated annealing, taboo search, genetic algorithms, etc.) fairly well in theory, I had been at a complete loss as to how to implement any of them. And so far, during the course of my work, I had never had an opportunity to use any of these techniques. Until today that is.

I can’t describe the problem here since this is a client assignment, but when I had to pick a subset from a large set that satisfied certain properties, I knew I needed a method that would reach the best subset quickly. A “fitness function” quickly came to mind and it was obvious that I should use genetic algorithms to solve the problem.

The key with using genetic algorithms is that you need to be able to code the solution in the form of a string, and then define functions such as “crossover” and “mutation”. Given that I was looking for a subset, coding it as a string was rather easy, and since I had unordered subsets, the crossover was also easy – basic random number generation. Within fifteen minutes of deciding I should use GA, the entire algorithm was in my head. It was only a question of implementing it.

As I started writing the code, I started getting fantasies of being able to finish it in an hour and then write a blog post about it. As it happened, it took much longer. The first cut took some three hours (including some breaks), and it wasn’t particularly good, and was slow to converge.  I tweaked things around a bit but things didn’t improve by much.

And that was when I realise that I had done the crossover wrong – when two strings have elements in common and need to be crossed over, I had to take care that elements in common did not repeat into the same “child” (needed the subsets to be of a certain length). So that needed some twist in the code. That done, the code still seemed inefficient.

I had been doing the crossover wrong. If I started off with 10 strings, I would form 5 pairs from them (each participating in exactly one pair) which would result in 10 new strings. And then I would put these 20 (original 10 and new 10) strings through a fitness test and discard the weakest 10. And iterate. The problem was that the strongest strings had as much of a chance of reproducing as the weakest. This was clearly not right.

So I tweaked the code so that the fitter strings had a higher chance of reproducing than the less fit. This required me to put the fitness test at the beginning of each iteration rather than the end. I had to refactor the code a little bit to make sure I didn’t repeat computations. Now I was drawing pairs of strings from the original “basket” and randomly crossing them over. And putting them through the fitness test. And so forth.

I’m extremely happy with the results of the algorithm. I’ve got just the kind of output that I had expected. More importantly, I was extremely happy with the process of coding the whole thing in. I did the entire coding in R, which is what I use for my data analysis (data size meant I didn’t need anything quicker).

The more interesting part is that this only solved a very small part of the problem I’m trying to solve for my client. Tomorrow I’m back to solving a different part of the problem. Genetic algorithms have served their purpose. Back when I started this assignment I had no clue I would be using genetic algorithms. In fact, I had no clue what techniques I might use.

Which is why I get annoyed when people ask me what kind of techniques I use in my problem solving. Given the kind of problems I take on, most will involve a large variety of math, CS and statistics techniques, each of which will only play a small role in the entire solution. This is also the reason I get annoyed when people put methods they are going to use to solve the problem on their pitch-decks. To me, that gives an impression that they are solving a toy version of the problem and not the real problem – or that the consultants are grossly oversimplifying the problem to be solved.

PS: Much as some people might describe it that way, I wouldn’t describe Genetic Algorithms as “machine learning”. I think there’s way too much intervention on the part of the programmer for it to be described thus.