AlphaZero Revisited

It’s been over a year since Google’s DeepMind first made its splash with the reinforcement-learning based chess playing engine AlphaZero. The first anniversary of the story of AlphaZero being released also coincided with the publication of the peer-reviewed paper.

To go with the peer-reviewed paper, DeepMind has released a further 200 games played between AlphaZero and the conventional chess engine StockFish, which is again heavily loaded in favour of wins for AlphaZero, but also contains 6 game where AlphaZero lost. I’ve been following these games on GM Daniel King’s excellent Powerplaychess channel, and want to revise my opinion on AlphaZero.

Back then, I had looked at AlphaZero’s play from my favourite studs and fighter framework, which in hindsight doesn’t do full justice to AlphaZero. From the games that I’ve seen from the set released this season, AlphaZero’s play hasn’t exactly been “stud”. It’s just that it’s much more “human”. And the reason why AlphaZero’s play possibly seems more human is because of the way it “learns”.

Conventional chess engines evaluate a position by considering all possible paths (ok not really, they use an intelligent method called Alpha-Beta Pruning to limit their search size), and then play the move that leads to the best position at the end of the search. These engines use “pre-learnt human concepts” such as point count for different pieces, which are used to evaluate positions. And this leads to a certain kind of play.

AlphaZero’s learning, process, however, involves playing zillions of games against itself (since I wrote that previous post, I’ve come back up to speed with reinforcement learning). And then based on the results of these games, it evaluates positions it reached in the course of play (in hindsight). On top of this, it builds a deep learning model to identify the goodness of positions.

Given my limited knowledge of how deep learning works, this process involves AlphaZero learning about “features” of games that have more often than not enabled it to win. So somewhere in the network there will be a node that represents “control of centre”. Another node deep in the network might represent “safety of king”. Yet another might perhaps involve “open A file”.

Of course, none of these features have been pre-specified to AlphaZero. It has simply learnt it by training its neural network on zillions of games it has played against itself. And while deep learning is hard to “explain”, it is likely to have so happened that the features of the game that AlphaZero has learnt are remarkably similar to the “features” of the game that human players have learnt over the centuries. And it is because of the commonality in these features that we find AlphaZero’s play so “human”.

Another way to look at is from the concept of “10000 hours” that Malcolm Gladwell spoke about in his book Outliers. As I had written in my review of the book, the concept of 10000 hours can be thought of as “putting fight until you get enough intuition to become stud”. AlphaZero, thanks to its large number of processors, has effectively spent much more than “10000 hours” playing against itself, with its neural network constantly “learning” from the positions faced and the outcomes of the game reached. And this way, it has “gained intuition” over features of the game that lead to wins, giving it an air of “studness”.

The interesting thing to me about AlphaZero’s play is that thanks to its “independent development” (in a way like the Finches of Galapagos), it has not been burdened by human intuition on what is good or bad, and learnt its own heuristics. And along the way, it has come up with a bunch of heuristics that have not commonly be used by human players.

Keeping bishops on the back rank (once the rooks have been connected), for example. A stronger preference for bishops to knights than humans. Suddenly simplifying from a terrifying-looking attack into a winning endgame (machines are generally good at endgames, so this is not that surprising). Temporary pawn and piece sacrifices. And all that.

Thanks to engines such as LeelaZero, we can soon see the results of these learnings being applied to human chess as well. And human chess can only become better!

Magnus Carlsen’s Endowment

Game 12 of the ongoing Chess World Championship match between Magnus Carlsen and Fabiano Caruana ended in an unexpected draw after only 31 moves, when Carlsen, in a clearly better position and clearly ahead on time, made an unexpected draw offer.

The match will now go into a series of tie-breaks, played with ever-shortening time controls, as the world looks for a winner. Given the players’ historical record, Carlsen is the favourite for the rapid playoffs. And he knows it, since starting in game 11, he seemed to play towards taking it into the playoffs.

Yesterday’s Game 12 was a strange one. It started off with a sharp Sicilian Pelikan like games 8 and 10, and then between moves 15 and 20, players repeated the position twice. Now, the rules of chess state that if the same position appears three times on the board, the game is declared a draw. And there was this move where Caruana had the chance to repeat a position for the third time, thus drawing the game.

He spent nearly half an hour on the move, and at the end of it, he decided to deviate. In other words, no quick draw. My suspicion is that this unnerved Carlsen, who decided to then take a draw at the earliest available opportunity available to him (the rules of the match state that a draw cannot be agreed before move 30. Carlsen made his offer on move 31).

In behavioural economics, Endowment Effect refers to the bias where you place a higher value on something you own than on something you don’t own. This has several implications, all of which can lead to potentially irrational behaviour. The best example is “throwing good money after bad” – if you have made an investment that has lost money, rather than cutting your losses, you double down on the investment in the hope that you’ll recoup your losses.

Another implication is that even when it is rational to sell something you own, you hold on because of the irrationally high value you place on it. The endowment effect also has an impact in pricing and negotiations – you don’t mind that “convenience charge” that the travel aggregator adds on just before you enter your credit card details, for you have already mentally “bought” the ticket, and this convenience charge is only a minor inconvenience. Once you are convinced that you need to do a business deal, you don’t mind if the price moves away from you in small marginal steps – once you’ve made the decision that you have to do the deal, these moves away are only minor, and well within the higher value you’ve placed on the deal.

So where does this fit in to Carlsen’s draw offer yesterday? It was clear from the outset that Carlsen was playing for a draw. When the position was repeated twice, it raised Carlsen’s hope that the game would be a draw, and he assumed that he was getting the draw he wanted. When Caruana refused to repeat position, and did so after a really long think, Carlsen suddenly realised that he wasn’t getting the draw he thought he was getting.

It was as if the draw was Carlsen’s and it had now been taken away from him, so now he needed to somehow get it. Carlsen played well after that, and Caruana played badly, and the engines clearly showed that Carlsen had an advantage when the game crossed move 30.

However, having “accepted” a draw earlier in the game (by repeating moves twice), Carlsen wanted to lock in the draw, rather than play on in an inferior mental state and risk a loss (which would also result in the loss of the Championship). And hence, despite the significantly superior position, he made the draw offer, which Caruana was only happy to accept (given his worse situation).

 

 

AlphaZero defeats Stockfish: Quick thoughts

The big news of the day, as far as I’m concerned, is the victory of Google Deepmind’s AlphaZero over Stockfish, currently the highest rated chess engine. This comes barely months after Deepmind’s AlphaGo Zero had bested the earlier avatar of AlphaGo in the game of Go.

Like its Go version, the AlphaZero chess playing machine learnt using reinforcement learning (I remember doing a term paper on the concept back in 2003 but have mostly forgotten). Basically it wasn’t given any “training data”, but the machine trained itself on continuously playing with itself, with feedback given in each stage of learning helping it learn better.

After only about four hours of “training” (basically playing against itself and discovering moves), AlphaZero managed to record this victory in a 100-game match, winning 28 and losing none (the rest of the games were drawn).

There’s a sample game here on the Chess.com website and while this might be a biased sample (it’s likely that the AlphaZero engineers included the most spectacular games in their paper, from which this is taken), the way AlphaZero plays is vastly different from the way engines such as Stockfish have been playing.

I’m not that much of a chess expert (I “retired” from my playing career back in 1994), but the striking things for me from this game were

  • the move 7. d5 against the Queen’s Indian
  • The piece sacrifice a few moves later that was hard to see
  • AlphaZero’s consistent attempts until late in the game to avoid trading queens
  • The move Qh1 somewhere in the middle of the game

In a way (and being consistent with some of the themes of this blog), AlphaZero can be described as a “stud” chess machine, having taught itself to play based on feedback from games it’s already played (the way reinforcement learning broadly works is that actions that led to “good rewards” are incentivised in the next iteration, while those that led to “poor rewards” are penalised. The challenge in this case is to set up chess in a way that is conducive for a reinforcement learning system).

Engines such as StockFish, on the other hand, are absolute “fighters”. They get their “power” by brute force, by going down nearly all possible paths in the game several moves down. This is supplemented by analysis of millions of existing games of various levels which the engine “learns” from – among other things, it learns how to prune and prioritise the paths it searches on. StockFish is also fed a database of chess openings which it remembers and tries to play.

What is interesting is that AlphaZero has “discovered” some popular chess openings through the course of is self-learning. It is interesting to note that some popular openings such as the King’s Indian or French find little favour with this engine, while others such as the Queen’s Gambit or the Queen’s Indian find favour. This is a very interesting development in terms of opening theory itself.

Frequency of openings over time employed by AlphaZero in its “learning” phase. Image sourced from AlphaZero research paper.

In any case, my immediate concern from this development is how it will affect human chess. Over the last decade or two, engines such as stockfish have played a profound role in the development of chess, with current top players such as Magnus Carlsen or Sergey Karjakin having trained extensively with these engines.

The way top grandmasters play has seen a steady change in these years as they have ingested the ideas from engines such as StockFish. The game has become far more quiet and positional, as players seek to gain small advantages which steadily improves over the course of (long) games. This is consistent with the way the engines that players learn from play.

Based on the evidence of the one game I’ve seen of AlphaZero, it plays very differently from the existing engines. Based on this, it will be interesting to see how human players who train with AlphaZero based engines (or their clones) will change their game.

Maybe chess will turn back to being a bit more tactical than it’s been in the last decade? It’s hard to say right now!

Curation, editing and predictability

One of my favourite lunchtime hobbies over the last one year has been watching chess videos. My favourite publishers in this regard are GM Daniel King and Mato Jelic. King is a far superior analyst and goes into more depth while analysing games, though Jelic has a far larger repertoire (King usually only analyses games the day they were played).

In some ways I might be biased towards Jelic because his analysis and focus are largely in line with my strengths back during my days as a competitive chess player. Deep opening analysis, attacking games, the occasional tactical flourish and so on. He has a particular fondness for the games of Mikhail Tal, showering praises on his (Tal’s) sometimes erratic and seemingly purposeless sacrifices.

Once you watch a few videos of Jelic, though, you realise that there is a formula to his commentary. At some point in the game, he announces that the game is in a “critical position” and asks the viewer to pause the video and guess the next move. And a few seconds of pause later, he proceeds to show the move and move on with the game.

While this is an interesting exercise the first few times around, after a few times I started seeing a pattern – Jelic has a penchant for attacking positions, and the moves following his “critical positions” are more often than not sacrifices. And once I figured this bit out, I started explicitly looking for sacrifices or tactical combination every time he asked me to pause, and that has made the exercise a lot less fun.

I’d mentioned on this blog a few weeks back about my problem with watching movies – in that I’m constantly trying to second-guess the rest of the movie based on the information provided thus far. And when a movie gets too predictable, it tends to lose my attention. And thinking about it, I think sometimes it’s about curation or editing that makes things too predictable.

To take an example, my wife and I have been watching Masterchef Australia this year (no spoilers, please!), and I remarked to her the other day that episodes have been too predictable – at the end of every contest, it seems rather easy to predict who might win or go down, and so there has been little element of surprise in the show.

My wife remarked that this was not due to the nature of the competition itself (which she said is as good as earlier editions), but due to the poor editing of the show – during each competition, there is a disproportional amount of time dedicated to showing the spectacularly good and spectacularly bad performances.

Consequently, just this information – on who the show’s editors have chosen to focus on for the particular episode – conveys a sufficient amount of information on each person’s performance, without even seeing what they’ve made! A more equitable distribution of footage across competitors, on the other hand, would do a better job of keeping the viewers guessing!

It is similar in the case of Jelic’s videos. There is a pattern to the game situation where he pauses, which biases the viewer in terms of guessing what the next move will be. In order to make the experience superior for his viewers, Jelic should mix it up a bit, occasionally showing slow Carlsen-like positions, and stopping games at positional “critical positions”, for example. That can make the pauses more interesting, and improve viewer experience!

What are other situations where bad editing effectively gives away the plot, and diminishes the experience?

The importance of queen side counterplay

Back in 1994 when I was still playing competitive chess (I practically retired in a year’s time after a series of blunders under pressure), I had played in this one special tournament that was played to “prepare Karnataka youngsters for national events”. Though I wasn’t travelling to any of these events, being a “promising youngster” I had received an invitation to play.

It was a weird kind of tournament, for apart from us “youngsters”, there were these senior players from the state who participated in the tournament on and off. Their scores weren’t tallied – all they did was to make sure each “youngster” played an equal number of games against a “senior player” and only youngsters’ scores counted.

In the first round of the tournament, I faced off against a senior named Nagesh (if I remember correctly). Nagesh played white and played a King’s Indian Attack against my Sicilian Defence (part of this special tournament was to expose us to non-standard openings and plays). It was a hard fought middle and end game where experience ultimately prevailed, and I lost.

In the analysis after the game, Nagesh pointed out that while he had an established centre and strong kind side attack, I had managed to build up a fairly expansive position on the queen side, and that I should have “pushed harder on the queenside for counterplay” rather than simply defending. While I took his point, I didn’t see the point of expanding on the queen side to grab a couple of pawns and (with a remote chance) threaten to queen one of my pawns there when my king was under heavy attack.

This bewilderment continued through the next year, as I studied openings for which the stated strategy was to “get counterplay on the queenside”. Not being a particularly great endgame player (though I did show some promise in that in my brief career), the advantage that could be gained by the gain of a pawn was lost to me, and I would prefer to go for a more tactical game (which usually didn’t go too well).

As an adult, while I don’t play competitively any more, I continue to follow chess and watch videos from time to time for entertainment. I’ve developed more nuance on strategy, and in playing a positional game. I’ve seen how small advantages (like space, or even a pawn) can be turned into decisive victories, and given myself shit for not learning to play endgames better back during my playing career. It’s a more holistic view of chess than the one I had formed as a schoolboy having mugged up all the moves of Morphy’s 17-move win against the Duke of Brunswick and Count Isouard (I still remember that game by heart).

Though it doesn’t take much convincing now for me to appreciate the joys of positional play, and going for queen side counterplay when your king is under attack, I found the game played by Viswanathan Anand against Veselin Topalov in the first round of the ongoing Candidates tournament rather interesting.

The two players go for different strategies – while Topalov builds up for an attack against Anand’s king, Anand goes for queen side counterplay (the bit I didn’t get back when I was a young player) and goes pawn grabbing. It was a rather complex game and both players played rather inaccurately under time pressure, but it is an excellent example of how queen side counterplay can help defuse an attack.

Anand’s queen nearly gets trapped (in the press conference after the game, he said he was reconciled to giving it up if attacked). There is a massive piledriver of pieces Topalov stacks up on the king side to attack Anand’s king. There is absolutely no threat of danger on Topalov’s king.

Yet, from time to time, Anand’s pawn grabbing strategy means Topalov has to move back some pieces to the queen side for its defence, blunting the attack. Then, Topalov needs to recover lost material, and moves his rooks to the queen side for that purpose. There is a mad scramble around the time control (both players got into time trouble) when the position gets liquidated with a lot of pieces exchanged.

After the dust settles, we find that Topalov’s remaining pieces are horribly misplaced on the queenside (on a pawn recovery campaign), while Anand’s are now trained towards an attack on Topalov’s king. As Topalov scrambles to defuse this attack, he loses material, and ultimately resigns.

It was a fascinating game to get a potentially fascinating tournament underway. I hope to follow it as best as I can, though that might not be so trivial given the holiday I’m taking later this month. Watching GM Daniel King’s analysis of Anand’s game (linked above) started making me wonder if I’d have played differently had I had access to such high quality commentary when I was still a competitive player two decades ago.

As for that tournament, I ended up beating the other senior player I played against. He blundered his queen in a typical tactical Sicilian Dragon Yugoslav Attack position (I was white). I placed second among all the “youngsters” there, and got my only prize money from chess after that game – a princely Rs. 80 (which wasn’t that bad for a schoolboy in 1994)!

How computers have changed chess

Prior to computers, limited depth of analysis meant chess strategies were “calibrated to model”. Now they’re calibrated to actual results and that results in better strategies (unconstrained by aesthetics)

With the chess Candidates tournament starting in Moscow today (to decide World Champion Magnus Carlsen’s next challenger), I’ve been watching a few chess videos of late, and participating in discussions on why Anand has been finding it hard to play of late.

One thing that people have widely agreed is that computers have changed the way chess is played, and the “new generation” (Carlsen, Hikaru Nakamura, Fabiano Caruana, Anish Giri, etc.) have learnt the game in a completely different way from the old-timers, which dictates the way they play.

For example, these new guys play the kind of positions that earlier generations wouldn’t dream of playing. Given a position and a bunch of moves that seem similarly strong, the moves the new generation picks is different from what an older player would pick. And computer analysis is credited with this.

The basic advantage with computer analysis is that positions can now be evaluated easily to a much larger “depth” (number of moves from current position) compared to earlier manual analysis. In the manual analysis, you could evaluate the position for a few moves after which you would reach a position that you would judge manually. Judging different possible continuations this way, you would evaluate a position and figure what was a good continuation.

The problem with limited depth search was that after a certain depth, you simply had to use your judgment on what was a good position, and this judgment (the “boundary condition” that went into your model) would have a profound effect on how you evaluated different moves. Over time, all you cared about was the aesthetics of the chessboard, and not really how you could translate the position to victory (or a draw).

In other words, in the days before computers, chess players were building their strategies by calibrating them to a model rather than by calibrating them to actual results on the board. And this resulted in a bias towards “pretty strategies” and those that gave advantages that were obvious.

With computers, however, there is no such constraint on the depth of ply. You can analyse the position to far greater depth and get really close to the result in the course of your analysis. And so you don’t really care about the aesthetics of the positions you reach, as long as you know how they can translate to the result you want.

So the “new generation”, which has always been trained using computers, see the game differently. People of Anand’s generation (there’s also Veselin Topalov and Levon Aronian in the ongoing Candidates tournament) learnt the game with classic aesthetics and optimise their play to get there. Carlsen’s generation has no such biases and they play to what is the actual advantage irrespective of aesthetics.

And that’s how the battle is building up! This should be an interesting tournament!

Volleyball

It’s been over eight years since I last played the game, but if I were to pick one outdoor game in which I’m best at (relative to other games I’ve played) it’s volleyball. And when I say I’m best at that, it’s on a strict relative basis – in undergrad, I struggled to get into my hostel team (let alone college team). It just goes to show how bad I’ve been in other outdoor games! I’m a successful cricket and football-watcher, though!

The thing with volleyball is that my game runs counter to how i play other games, and my life in general. In general, I’m an extremely high-risk person – I’m not into adventure sports, though, but have a Royal Enfield motorcycle – I take chances where possible and go for the spectacular. It is hard for me to be “accurate” and “correct”, and given that I know that I’m prone to making mistakes I try to maximize the outputs from the times when I don’t make mistakes, and thus go on a high risk path.

So I’ve quit my job without something else in hand four times, now freelance as a management consultant, blog about every damn thing – things that have promises of big upsides, but also risks of downsides. It also reflects in how I sometimes talk to people – I sometimes try too hard to make an impression – which can potentially get me big returns, but end up saying something stupid at times, and end up sounding arrogant at other times. Those are risks I willingly take.

And this risky nature has reflected in most games I’ve played, also – again nothing in the recent past. In chess, I get bored of slow technical Carlsen-esque positions, and am prone to go on Morphy-esque attacks that can backfire spectacularly. Playing bridge, I finesse way more than I’m supposed to – making some otherwise unmakeable contracts, but going down in contracts I should have otherwise made.

Back in school, when we played cricket with rubber and tennis balls, I would bowl leg spin, and using a light bat, would try to hit every ball for four or six, rather than trying to bat steadily. And while playing basketball (my “second best” outdoor game, after volleyball) I have a propensity to go for long shots.

What sets volleyball apart is that my game completely runs counter to who I am. In volleyball I’m a solid player – don’t spike too much (can’t jump!!), but can set spikes well, block well and can lead a team well from the back line. In fact, my best volleyball games have been those when the team has had to carry some weak links, and I’ve led from the centre of the back line, lending solidity and helping build up attacks. It definitely doesn’t reflect what I’m like otherwise.

But volleyball has also been the game where I’ve had a large number of spectacular failures. At every level I’ve played, I’ve had some responsibility thrust upon me, and I’ve buckled under the pressure. It’s volleyball that comes to mind every time I let down people’s trust because I do badly a something I’m supposed to be good at.

1. Voyagers versus pioneers, 1999: This was the school inter-house tournament. We go two sets up. They win the next two. Down to the decider. We lead 14-13, and its our turn to serve. Our captain purposely messes up our rotation such that I can serve (I had a big serve – one attacking aspect of my volleyball). The serve clips the net on its way across (back then, a let was a foul serve in volleyball). We lose.

2. NPS Indiranagar versus NPS Rajajinagar, 1999: Then I get selected to represent my school. I’m on the bench, and am subbed in right on time to serve. I decide to warm up with an underarm serve (before I start unleashing my overarm thunders). Hit it into the net. Opponent’s serve comes to me and I receive it badly. Get subbed out.

3. G block versus F block, 2004-05: Semi finals of the IIMB inter-hostel championship. We have two big spikers, two decent lifters and defenders (including me) and two who had never played volleyball in their lives, but were chosen on the basis of their physical fitness alone. Down to third set (best of three). We lead 25-24 (new scoring system). I’m playing right forward. Ball comes across the net. All I need to do is to set it up for a big spike, but I decide to spike it directly myself. And miss. Then I serve on the next match point. Decide to go for a safe serve, gets returned. We lose.

4. Section C versus Section A, 2004-05: Again similar story. I don’t remember the specifics of this, but again it was heartbreak, and I think I missed my serve on match point.

I guess you get the drift..