Football Elo Application

This morning, I discovered the Club Elo Ratings, and promptly proceeded to analyse Liverpool FC’s performance over the years based on these ratings, and then correlated the performance by manager.

Then, playing around with the data of different clubs, I realised that there are plenty more stories to be told using this data, and they are best told by people who are passionate about their respective clubs. So the best thing I could do is to put the data out there (in a form similar to what I did for Liverpool), so that people can analyse how their clubs have performed over the years, and under different managers.

Sitting beside me as I was doing this analysis, my wife popped in with a pertinent observation. Now, she doesn’t watch football. She hates it that I watch so much football. Nevertheless, she has a strong eye for metrics. And watching me analyse club performance by manager, she asked me if I can analyse manager performance by club!

And so I’ve added that as well to the Shiny app that I’ve built. It might look a bit clunky, with two unrelate graphs, one on top of the other, but since the two are strongly related, it makes sense to have both in the same app. The managers listed in the bottom dropdown are those who have managed at least two clubs in the Premier League.

If you’re interested in Premier League football, you should definitely check out the app. I think there are some interesting insights to be gleaned (such as what I presented in this morning’s post).

Built by Shanks

This morning, I found this tweet by John Burn-Murdoch, a statistician at the Financial Times, about a graphic he had made for a Simon Kuper (of Soccernomics fame) piece on Jose Mourinho.

Burn-Murdoch also helpfully shared the code he had written to produce this graphic, through which I discovered ClubElo, a website that produces chess-style Elo ratings for football clubs. They have a free and open API, through which Burn-Murdoch got the data for the above graphic, and which I used to download all-time Elo ratings for all clubs available (I can be greedy that way).

So the first order of business was to see how Liverpool’s rating has moved over time. The initial graph looked interesting, but not very interesting, so I decided to overlay it with periods of managerial regimes (the latter data I got through wikipedia). And this is what the all-time Elo rating of Liverpool looks like.

It is easy to see that the biggest improvement in the club’s performance came under the long reign of Bill Shankly (no surprises there), who took them from the Second Division to winning the old First Division. There was  brief dip when Shankly retired and his assistant Bob Paisley took over (might this be the time when Paisley got intimidated by Shankly’s frequent visits to the club, and then asked him not to come any more?), but Paisley consolidated on Shankly’s improvement to lead the club to its first three European Cups.

Around 2010, when the club was owned by Americans Tom Hicks and George Gillett and on a decline in terms of performance, this banner became popular at Anfield.

The Yanks were subsequently yanked following a protracted court battle, to be replaced by another Yank (John W Henry), under whose ownership the club has done much better. What is also interesting from the above graph is the managerial change decisions.

At the time, Kenny Dalglish’s sacking at the end of the 2011-12 season (which ended with Liverpool losing the FA Cup final to Chelsea) seemed unfair, but the Elo rating shows that the club’s rating had fallen below the level when Dalglish took over (initially as caretaker). Then there was a steep ascent under Brendan Rodgers (leading to second in 2013-14), when Suarez bit and got sold and the team went into deep decline.

Again, we can see that Rodgers got sacked when the team had reverted to the rating that he had started off with. That’s when Jurgen Klopp came in, and thankfully so far there has been a much longer period of ascendance (which will hopefully continue). It is interesting to see, though, that the club’s current rating is still nowhere near the peak reached under Rafa Benitez (in the 2008-9 title challenge).

Impressed by the story that Elo Ratings could tell, I got data on all Premier League managers, and decided to repeat the analysis for all clubs. Here is what the analysis for the so-called “top 6” clubs returns:

We see, for example, that Chelsea’s ascendancy started not with Mourinho’s first term as manager, but towards the end of Ranieri’s term – when Roman Abramovich had made his investment. We find that Jose Mourinho actually made up for the decline under David Moyes and Louis van Gaal, and then started losing it. In that sense, Manchester United have got their sacking timing right (though they were already in decline by the time they finished last season in second place).

Manchester City also seem to have done pretty well in terms of the timing of managerial changes. And Spurs’s belief in Mauricio Pochettino, who started off badly, seems to have paid off.

I wonder why Elo Ratings haven’t made more impact in sports other than chess!

AlphaZero Revisited

It’s been over a year since Google’s DeepMind first made its splash with the reinforcement-learning based chess playing engine AlphaZero. The first anniversary of the story of AlphaZero being released also coincided with the publication of the peer-reviewed paper.

To go with the peer-reviewed paper, DeepMind has released a further 200 games played between AlphaZero and the conventional chess engine StockFish, which is again heavily loaded in favour of wins for AlphaZero, but also contains 6 game where AlphaZero lost. I’ve been following these games on GM Daniel King’s excellent Powerplaychess channel, and want to revise my opinion on AlphaZero.

Back then, I had looked at AlphaZero’s play from my favourite studs and fighter framework, which in hindsight doesn’t do full justice to AlphaZero. From the games that I’ve seen from the set released this season, AlphaZero’s play hasn’t exactly been “stud”. It’s just that it’s much more “human”. And the reason why AlphaZero’s play possibly seems more human is because of the way it “learns”.

Conventional chess engines evaluate a position by considering all possible paths (ok not really, they use an intelligent method called Alpha-Beta Pruning to limit their search size), and then play the move that leads to the best position at the end of the search. These engines use “pre-learnt human concepts” such as point count for different pieces, which are used to evaluate positions. And this leads to a certain kind of play.

AlphaZero’s learning, process, however, involves playing zillions of games against itself (since I wrote that previous post, I’ve come back up to speed with reinforcement learning). And then based on the results of these games, it evaluates positions it reached in the course of play (in hindsight). On top of this, it builds a deep learning model to identify the goodness of positions.

Given my limited knowledge of how deep learning works, this process involves AlphaZero learning about “features” of games that have more often than not enabled it to win. So somewhere in the network there will be a node that represents “control of centre”. Another node deep in the network might represent “safety of king”. Yet another might perhaps involve “open A file”.

Of course, none of these features have been pre-specified to AlphaZero. It has simply learnt it by training its neural network on zillions of games it has played against itself. And while deep learning is hard to “explain”, it is likely to have so happened that the features of the game that AlphaZero has learnt are remarkably similar to the “features” of the game that human players have learnt over the centuries. And it is because of the commonality in these features that we find AlphaZero’s play so “human”.

Another way to look at is from the concept of “10000 hours” that Malcolm Gladwell spoke about in his book Outliers. As I had written in my review of the book, the concept of 10000 hours can be thought of as “putting fight until you get enough intuition to become stud”. AlphaZero, thanks to its large number of processors, has effectively spent much more than “10000 hours” playing against itself, with its neural network constantly “learning” from the positions faced and the outcomes of the game reached. And this way, it has “gained intuition” over features of the game that lead to wins, giving it an air of “studness”.

The interesting thing to me about AlphaZero’s play is that thanks to its “independent development” (in a way like the Finches of Galapagos), it has not been burdened by human intuition on what is good or bad, and learnt its own heuristics. And along the way, it has come up with a bunch of heuristics that have not commonly be used by human players.

Keeping bishops on the back rank (once the rooks have been connected), for example. A stronger preference for bishops to knights than humans. Suddenly simplifying from a terrifying-looking attack into a winning endgame (machines are generally good at endgames, so this is not that surprising). Temporary pawn and piece sacrifices. And all that.

Thanks to engines such as LeelaZero, we can soon see the results of these learnings being applied to human chess as well. And human chess can only become better!

What Ails Liverpool

So Liverpool FC has had a mixed season so far. They’re second in the Premier League with 36 points from 14 games (only points dropped being draws against ManCity, Chelsea and Arsenal), but are on the verge of going out of the Champions League, having lost all three away games.

Yesterday’s win over Everton was damn lucky, down to a 96th minute freak goal scored by Divock Origi (I’d forgotten he’s still at the club). Last weekend’s 3-0 against Watford wasn’t as comfortable as the scoreline suggested, the scoreline having been opened only midway through the second half. The 2-0 against Fulham before that was similarly a close-fought game.

Of concern to most Liverpool fans has been the form of the starting front three – Mo Salah, Roberto Firmino and Sadio Mane. The trio has missed a host of chances this season, and the team has looked incredibly ineffective in the away losses in the Champions League (the only shot on target in the 2-1 loss against PSG being the penalty that was scored by Milner).

There are positives, of course. The defence has been tightened considerably compared to last season. Liverpool aren’t leaking goals the way they did last season. There have been quite a few clean sheets so far this season. So far there has been no repeat of last season’s situation where they went 4-1 up against ManCity, only to quickly let in two goals and then set up a tense finish.

So my theory is this – each of the front three of Liverpool has an incredibly low strike rate. I don’t know if the xG stat captures this, but the number of chances required by each of Mane, Salah and Firmino before they can convert is rather low. If the average striker converts one in two chances, all of these guys convert one in four (these numbers are pulled out of thin air. I haven’t looked at the statistics).

And even during the “glory days” of last season when Liverpool was scoring like crazy, this low strike rate remained. Instead, what helped then was a massive increase in the number of chances created. The one game I watched live (against Spurs at Wembley), what struck me was the number of chances Salah kept missing. But as the chances kept getting created, he ultimately scored one (Liverpool lost 4-1).

What I suspect is that as Klopp decided to tighten things up at the back this season, the number of chances being created has dropped. And with the low strike rate of each of the front three, this lower number of chances translates into much lower number of goals being scored. If we want last season’s scoring rate, we might also have to accept last season’s concession rate (though this season’s goalie is much much better).

There ain’t no such thing as a free lunch.

Magnus Carlsen’s Endowment

Game 12 of the ongoing Chess World Championship match between Magnus Carlsen and Fabiano Caruana ended in an unexpected draw after only 31 moves, when Carlsen, in a clearly better position and clearly ahead on time, made an unexpected draw offer.

The match will now go into a series of tie-breaks, played with ever-shortening time controls, as the world looks for a winner. Given the players’ historical record, Carlsen is the favourite for the rapid playoffs. And he knows it, since starting in game 11, he seemed to play towards taking it into the playoffs.

Yesterday’s Game 12 was a strange one. It started off with a sharp Sicilian Pelikan like games 8 and 10, and then between moves 15 and 20, players repeated the position twice. Now, the rules of chess state that if the same position appears three times on the board, the game is declared a draw. And there was this move where Caruana had the chance to repeat a position for the third time, thus drawing the game.

He spent nearly half an hour on the move, and at the end of it, he decided to deviate. In other words, no quick draw. My suspicion is that this unnerved Carlsen, who decided to then take a draw at the earliest available opportunity available to him (the rules of the match state that a draw cannot be agreed before move 30. Carlsen made his offer on move 31).

In behavioural economics, Endowment Effect refers to the bias where you place a higher value on something you own than on something you don’t own. This has several implications, all of which can lead to potentially irrational behaviour. The best example is “throwing good money after bad” – if you have made an investment that has lost money, rather than cutting your losses, you double down on the investment in the hope that you’ll recoup your losses.

Another implication is that even when it is rational to sell something you own, you hold on because of the irrationally high value you place on it. The endowment effect also has an impact in pricing and negotiations – you don’t mind that “convenience charge” that the travel aggregator adds on just before you enter your credit card details, for you have already mentally “bought” the ticket, and this convenience charge is only a minor inconvenience. Once you are convinced that you need to do a business deal, you don’t mind if the price moves away from you in small marginal steps – once you’ve made the decision that you have to do the deal, these moves away are only minor, and well within the higher value you’ve placed on the deal.

So where does this fit in to Carlsen’s draw offer yesterday? It was clear from the outset that Carlsen was playing for a draw. When the position was repeated twice, it raised Carlsen’s hope that the game would be a draw, and he assumed that he was getting the draw he wanted. When Caruana refused to repeat position, and did so after a really long think, Carlsen suddenly realised that he wasn’t getting the draw he thought he was getting.

It was as if the draw was Carlsen’s and it had now been taken away from him, so now he needed to somehow get it. Carlsen played well after that, and Caruana played badly, and the engines clearly showed that Carlsen had an advantage when the game crossed move 30.

However, having “accepted” a draw earlier in the game (by repeating moves twice), Carlsen wanted to lock in the draw, rather than play on in an inferior mental state and risk a loss (which would also result in the loss of the Championship). And hence, despite the significantly superior position, he made the draw offer, which Caruana was only happy to accept (given his worse situation).

 

 

Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it.  And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided  test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).

Bridge!

While I have referred to the game of contract bridge multiple times on this blog, today was the first time ever since I started blogging that I actually played the game. I mean, i’ve played a few times with my computer, but today was the first time in nearly fifteen years that I actually “played”, with other humans in a semi-competitive environment.

It happened primarily thanks to the wife, who surprised me yesterday by randomly sending me links of two bridge clubs close to home. I found that one of them was meeting this evening, and welcomed newcomers (even those without partners), and I needed no further information.

One small complication was that it had been very many years since I had even played the game with my computer, or read bridge columns, and I needed to remember the rules. Complicating matters was the fact that most players at this club use four-card major bidding systems, while at IIT and with my computer I was used to playing five card majors.

I installed a bridge app on my phone and played a few games, and figured that I’m not too rusty. And so after an early dinner, and leaving a wailing Berry behind (she hates it when I go out of home without her), I took the 65 bus to the club.

The club has a “host” system, where members can volunteer to play with “visitors” without partners. My host tonight was Jenny, a retired school teacher and librarian. We quickly discussed the bidding system she uses, and it was time to play.

There were some additional complications, though. For example, they use bidding boxes to convey the bids here (so that you don’t give out verbal signals while bidding), and I had never seen one before. And then on the very first hand, I forgot that bidding takes place clockwise, and bid out of turn. That early mishap apart, the game went well.

We were sitting East-West in the pairs event, which meant we moved tables after every couple of hands. Jenny introduced me to our opponents at each table, helpfully adding in most cases that I was “playing after fifteen years. He had never seen a bidding box before today”.

I think I played fairly well, as people kept asking me where I play regularly and I had to clarify that today was the first time ever I was playing in England. Jenny was a great partner, forever encouraging and making me feel comfortable on my “comeback”.

At about three fourth of the session though, I could feel myself tiring. Hard concentration for three hours straight is not something I do on a regular basis, so it was taxing on my nerves. It came to a head when a lapse in my concentration allowed our opponents to make a contract they should have never made.

Thankfully, I noticed then that there was coffee and tea available in a back room. I quickly made myself a cup of tea with milk and sugar and was soon back to form.

Jenny and I finished a narrow second among all the East-West pairs. If my concentration hadn’t flagged three fourths of the way in, I think we might have even won our half of the event. Not a bad comeback, huh? After the event, someone told me that he would introduce me to “a very strong player who is looking for a partner”.

Oh, and did I mention that I was probably by far the youngest player there?

I’ll be back. And once again, thanks to the wife for the encouragement, and finding me this club, and taking care of Berry while I spent the evening playing!

Bankers predicting football

So the Football World Cup season is upon us, and this means that investment banking analysts are again engaging in the pointless exercise of trying to predict who will win the World Cup. And the funny thing this time is that thanks to MiFiD 2 regulations, which prevent banking analysts from giving out reports for free, these reports aren’t in the public domain.

That means we’ve to rely on media reports of these reports, or on people tweeting insights from them. For example, the New York Times has summarised the banks’ predictions on the winner. And this scatter plot from Goldman Sachs will go straight into my next presentation on spurious correlations:

Different banks have taken different approaches to predict who will win the tournament. UBS has still gone for a classic Monte Carlo simulation  approach, but Goldman Sachs has gone one ahead and used “four different methods in artificial intelligence” to predict (for the third consecutive time) that Brazil will win the tournament.

In fact, Goldman also uses a Monte Carlo simulation, as Business Insider reports.

The firm used machine learning to run 200,000 models, mining data on team and individual player attributes, to help forecast specific match scores. Goldman then simulated 1 million possible variations of the tournament in order to calculate the probability of advancement for each squad.

But an insider in Goldman with access to the report tells me that they don’t use the phrase itself in the report. Maybe it’s a suggestion that “data scientists” have taken over the investment research division at the expense of quants.

I’m also surprised with the reporting on Goldman’s predictions. Everyone simply reports that “Goldman predicts that Brazil will win”, but surely (based on the model they’ve used), that prediction has been made with a certain probability? A better way of reporting would’ve been to say “Goldman predicts Brazil most likely to win, with X% probability” (and the bank’s bets desk in the UK could have placed some money on it).

ING went rather simple with their forecasts – simply took players’ transfer values, and summed them up by teams, and concluded that Spain is most likely to win because their squad is the “most valued”. Now, I have two major questions about this approach – firstly, it ignores the “correlation term” (remember the famous England conundrum of the noughties of fitting  Gerrard and Lampard into the same eleven?), and assumes a set of strong players is a strong team. Secondly, have they accounted for inflation? And if so, how have they accounted for inflation? Player valuation (about which I have a chapter in my book) has simply gone through the roof in the last year, with Mo Salah at £35 million being considered a “bargain buy”.

Nomura also seems to have taken a similar approach, though they have in some ways accounted for the correlation term by including “team momentum” as a factor!

Anyway, I look forward to the football! That it is live on BBC and ITV means I get to watch the tournament from the comfort of my home (a luxury in England!). Also being in England means all matches are at a sane time, so I can watch more of this World Cup than the last one.

 

The science of shirt numbers

Yesterday, Michael Cox, author of the Zonal Marking blog and The Mixer, tweeted:

Now, there is some science to how football shirts are numbered. I had touched upon it in a very similar post I had written four years ago. You can also read this account on how players are numbered. And if you’re more curious about formations and their history, I recommend you read Jonathan Wilson’s Inverting the Pyramid.

To put it simply, number 1 is reserved for goalkeepers. Numbers 2 to 6 are for defenders, though some countries use either 4, 5 or 6 for midfielders. 7-11 are usually reserved for attacking midfielders and forwards, with 9 being the “centre forward” and 10 being the “second forward”.

Some of these numbers are so institutionalised that the number is sometimes enough to describe a player’s position and style. This has even led to jargon such as a “False Nine” (a midfielder playing furthest forward) or a “False Ten” (a striker playing in a withdrawn role).

There is less science to the allocation of shirt numbers 12 to 23, since these are not starting positions. One rule of thumb is to allocate these numbers for the backups for the corresponding positions. So 12 is the reserve goalie, 13 is the reserve right back and so on(with 23 for the squad’s third goalkeeper).

So how have teams chosen to number their squads in the FIFA World Cup that starts next week? This picture summarises the distribution of position by number: 

 

There is no surprise in Number 1, which all teams have allocated to their goalkeeper, and numbers 2 and 3 are mostly allocated to defenders as well (there are some exceptions there, with Iran’s Mehdi Torabi and Denmark’s Michael Krohn Dehli wearing Number 2 even though they are midfielders, and Iceland midfielder Samuel Friojonsson wearing 3).

That different countries use 4, 5 or 6 for midfielders is illustrated in the data, though two forwards (Australian legend Tim Cahill and Croatia’s Ivan Perisic) puzzlingly wear 4 (it’s less puzzling in Cahill’s case since he started as a central midfielder and slowly moved forward).

7 is the right winger’s number, and depending upon that position’s interpretation can either be a midfielder or a forward. 8 is primarily a midfielder, while 9 is (obviously) a striker’s number. Interestingly, five midfielders will wear the Number 9 shirt (the most prominent being Russia’s Alan Dzagoev). 10 and 11 are evenly split between midfielders and forwards, though two defenders (Serbia’s Aleksandr Kolarov and Tunisia’s Dylan Bronn) also wear 11.

Beyond 11, there isn’t that much of a science, but one thing that is clear is that Cox got it wrong – for it isn’t so “textbook” to give 12 to the reserve right back. As we can see from the data, 20 teams have used that number for their reserve goalies!

It’s like England has put their squad numbers into a little bit of a Mixer!

Chasing Dhoni

Former India captain Mahendra Singh Dhoni has a mixed record when it comes to chasing in limited overs games (ODIs and T20s). He initially built up his reputation as an expert chaser, who knew exactly how to pace an innings and accelerate at the right moment to deliver victory.

Of late, though, his chasing has been going wrong, the latest example being Chennai Super Kings’ loss at Kings XI Punjab over the weekend. Dhoni no doubt played excellently – 79 off 44 is a brilliant innings in most contexts. Where he possibly fell short was in the way he paced the innings.

And the algorithm I’ve built to represent (and potentially evaluate) a cricket match seems to have done a remarkable job in identifying this problem in the KXIP-CSK game. Now, apart from displaying how the game “flowed” from start to finish, the algorithm is also designed to pick out key moments or periods in the game.

One kind of “key period” that the algorithm tries to pick is a batsman’s innings – periods of play where a batsman made a significant contribution (either positive or negative) to his team’s chances of winning. And notice how nicely it has identified two distinct periods in Dhoni’s batting:

The first period is one where Dhoni settled down, and batted rather slowly – he hit only 21 runs in 22 balls in that period, which is incredibly slow for a 10 runs per over game. Notice how this period of Dhoni’s batting coincides with a period when the game decisively swung KXIP’s way.

And then Dhoni went for it, hitting 36 runs in 11 balls (which is great going even for a 10-runs-per-over game), including 19 off the penultimate over bowled by Andrew Tye. While this brought CSK back into the game (to right where the game stood prior to Dhoni’s slow period of batting), it was a little too late as KXIP managed to hold on.

Now I understand I’m making an argument using one data point here, but this problem with Dhoni, where he first slows down and then goes for it with only a few overs to go, has been discussed widely. What’s interesting is how neatly my algorithm has picked out these periods!