AlphaZero defeats Stockfish: Quick thoughts

The big news of the day, as far as I’m concerned, is the victory of Google Deepmind’s AlphaZero over Stockfish, currently the highest rated chess engine. This comes barely months after Deepmind’s AlphaGo Zero had bested the earlier avatar of AlphaGo in the game of Go.

Like its Go version, the AlphaZero chess playing machine learnt using reinforcement learning (I remember doing a term paper on the concept back in 2003 but have mostly forgotten). Basically it wasn’t given any “training data”, but the machine trained itself on continuously playing with itself, with feedback given in each stage of learning helping it learn better.

After only about four hours of “training” (basically playing against itself and discovering moves), AlphaZero managed to record this victory in a 100-game match, winning 28 and losing none (the rest of the games were drawn).

There’s a sample game here on the Chess.com website and while this might be a biased sample (it’s likely that the AlphaZero engineers included the most spectacular games in their paper, from which this is taken), the way AlphaZero plays is vastly different from the way engines such as Stockfish have been playing.

I’m not that much of a chess expert (I “retired” from my playing career back in 1994), but the striking things for me from this game were

  • the move 7. d5 against the Queen’s Indian
  • The piece sacrifice a few moves later that was hard to see
  • AlphaZero’s consistent attempts until late in the game to avoid trading queens
  • The move Qh1 somewhere in the middle of the game

In a way (and being consistent with some of the themes of this blog), AlphaZero can be described as a “stud” chess machine, having taught itself to play based on feedback from games it’s already played (the way reinforcement learning broadly works is that actions that led to “good rewards” are incentivised in the next iteration, while those that led to “poor rewards” are penalised. The challenge in this case is to set up chess in a way that is conducive for a reinforcement learning system).

Engines such as StockFish, on the other hand, are absolute “fighters”. They get their “power” by brute force, by going down nearly all possible paths in the game several moves down. This is supplemented by analysis of millions of existing games of various levels which the engine “learns” from – among other things, it learns how to prune and prioritise the paths it searches on. StockFish is also fed a database of chess openings which it remembers and tries to play.

What is interesting is that AlphaZero has “discovered” some popular chess openings through the course of is self-learning. It is interesting to note that some popular openings such as the King’s Indian or French find little favour with this engine, while others such as the Queen’s Gambit or the Queen’s Indian find favour. This is a very interesting development in terms of opening theory itself.

Frequency of openings over time employed by AlphaZero in its “learning” phase. Image sourced from AlphaZero research paper.

In any case, my immediate concern from this development is how it will affect human chess. Over the last decade or two, engines such as stockfish have played a profound role in the development of chess, with current top players such as Magnus Carlsen or Sergey Karjakin having trained extensively with these engines.

The way top grandmasters play has seen a steady change in these years as they have ingested the ideas from engines such as StockFish. The game has become far more quiet and positional, as players seek to gain small advantages which steadily improves over the course of (long) games. This is consistent with the way the engines that players learn from play.

Based on the evidence of the one game I’ve seen of AlphaZero, it plays very differently from the existing engines. Based on this, it will be interesting to see how human players who train with AlphaZero based engines (or their clones) will change their game.

Maybe chess will turn back to being a bit more tactical than it’s been in the last decade? It’s hard to say right now!

Interview length

When I interviewed for my current job four months back, I was put through over twelve hours of high-quality interviews. This includes both telephonic and face-to-face processes (on one day, I was called to the office and grilled from 1030am to 630pm) and by “high quality”, I’m referring to the standard of questions that I was asked.

All the interviews were extremely enjoyable, and I had fun solving the problems that had been thrown at me. I must mention here that the entire process was a “stud interview” – one that tried to evaluate me on my thought process rather than evaluating what I know. I’ve also been through a few “fighter interviews” – ones where the interviewer just spends time finding out your “knowledge” – and I don’t remember taking a single job so far after passing this kind of an interview.

So recently I read this post by Seth Godin that someone had shared on Google Reader, where he says that there exists just no point in having long interviews and so interviews should be kept short and to the point. That way, he says, people’s time gets wasted less and the candidate also doesn’t need to waste much time interviewing. After reading that, I was trying to put my personal experience into perspective.

One thing is that in a “stud interview”, where you throw tough problems at the candidate, one of the key “steps” in the solution process is for an insight to hit the candidate. Even if you give hints, and mark liberally for “steps”, the “cracking” of the problem usually depends upon an insight. And it isn’t fair to expect that an insight hits the candidate on each and every question, and so the way to take out this factor is by having a large number of questions. Which means the interview takes longer.

The other thing about the length of the interview is signaling. Twelve hours of hardcore problem-solving sends out a signal to the candidate with regard to the quality of the group. It gives an idea to the candidate about what it takes to get into the group. It says that every person working in the group had to go through this kind of a process and hence is likely to be of high quality.

Another thing with the “stud interview” is that it also directly gives the candidate an idea of the quality of the people interviewing. Typically, hard math-puzzle based interviews are difficult to “take” (for the interviewer). So putting the candidate through this large number of math-problem-solving interviews tells him that the large number of people interviewing him are all good enough to take this kind of an interview. And this kind of interviews are also ruthless on the interviewer – it is usually not hard for a smart candidate to see through it if he thinks the interviewer has just mugged the answer to a question without actually solving it.

All put together, when you are recruiting for a job based on “stud interviews”, it makes sense for you to take time, and make the candidate go through several rounds. It also usually helps that most of these “stud interviews” are usually fun for the candidate also. On the other hand, if you are only willing to test what the candidate knows and are not really interested in the way he thinks, then you might follow Godin’s suggestion and keep the interview short.