A couple of months back, i was watching a One Day International match between New Zealand and India. it was during this series that the WASP algorithm for predicting score at the end of the innings (for the team batting first) and chances of victory (for the team batting second) first came in to public prominence. During the games, the scorecard would include the above data, which was derived from the WASP algorithm.
This one game that I was watching (I think it was the fourth match of the series), New Zealand was chasing. They were fifty odd without loss, and WASP showed their chances of winning as somewhere around 60%. Then Jesse Ryder got out. Suddenly, the chances of winning, as shown on screen dropped to the forties! How can the fall of one wicket, and at an early stage in the game, influence a game so much? The problem is with the algorithm.
Last year, during the IPL, i tried running this graphic that I called the “Tug-of-War”, that was to depict how the game swung between the two teams. Taking the analogy forward, if you were to imagine the game of cricket as a game of Tug-of-War, the graph plotted the position of the middle of the rope as a function of time. Here is a sample graphic from last year:
This shows how the game between the Pune Warriors and Sun Risers “moved”. The upper end of the graph represents the Sun Risers’ “line” and the lower end the line of the Pune Warriors. So we can see from this graph that for the first half of the Sun Risers innings (they batted first), the Sun Risers were marginally ahead. Then Pune pulled it back in the second half, and then somewhere midway through the Pune Innings, the SunRisers pulled it back again, and eventually won the game.
At least that’s the intention with which I started putting out this graphic. In practice, you can see that there is a problem. Check out the graph somewhere around the 8th over of the Pune innings. This was when Marlon Samuels got out. How can one event change the course of the game so dramatically? It was similar to the movement in the WASP when Ryder got out in the recent NZ-India match.
So what is the problem here? Based on the WASP algorithm that the designers have kindly published, and the algorithm I used for last year’s IPL (which was Monte Carlo-based), the one thing common is that both algorithms are Markovian (I know mine is, and from what WASP has put out, I’m guessing theirs is, too). To explain in English, what our algorithms assume is that what happens in the next ball doesn’t depend on what has happened so far. The odds of different events on the next ball (dot, six, single, out, etc.) are independent of how the previous balls have shaped up – this is the assumption that our algorithms use. And since that doesn’t accurately represent what happens in a cricket match, we end up with “thin tails”.
Recently, to evaluate IPL matches, with a view of evaluating players ahead of the auction, I reverse engineered the WASP algorithm, and decided to see what it says about the score at the end of an ODI innings. Note that my version is team agnostic, and assumes that every ball is bowled by “the average bowler” to “the average batsman”. The distribution of team score at the end of the first innings, as calculated by my algorithm, can be seen in the blue line in the graph below. The red line shows the actual distribution of score at the end of an ODI innings in the last 5 years (same data that’s been used to construct the model).
Note how the blue curve has a much higher peak, and tails off very quickly on either side. In other words, a lot of “mass” is situated within a small range of scores, and this leads to the bizarre situations as you can see in the first graph, and what I saw in the New Zealand India game.
The problem with a dynamic programming based approach, such as WASP, is that you need to make a Markovian assumption, and that assumption results in thin tails. And when you are trying to predict the probability of victory, and are using a curve such as the blue one above as your expected distribution of score at the end of the innings, events such as a six or a wicket can drastically alter your calculated odds.
To improve the cricket prediction system, what we need is an algorithm that can replicate the “fat tails” that the actual distribution of cricket scores shows. My current Monte Carlo based algorithm doesn’t cut it. Neither does the WASP.
Thank-you for posting this thoughtful analysis. As one of the co-authors of the WASP, I can confirm and clarify much of your analysis.
You are right that WASP is a Markovian algorithm. I don’t think that a DP approach has to necessarily be Markovian. Indeed, Scott and I discussed at length possible different ways of incorporating some non-Markovian aspect into the algorithm (and also keeping it Markovian but expanding the state space) but decided that the value from the possible additional accuracy would be outweighed by a very large increase in the complexity of the algorithm.
What motivated us to think about this was something similar to what you have noted: In the second innings, a probit regression of the eventual result on the WASP winning probability showed that the WASP estimates were biased away from 50%, which we figured was due to some Markovian aspect of the process, or equivalently, due to the assumption that every delivery is an independent event. That, however, was in an early form of the model that (implicitly) assumed that every game was played in the average batting conditions. The bias is heavily reduced once one controls for conditions, for the simple reason that a variable that applies to every delivery in an innings (i.e. an indicator of the ease of batting), is an extreme example of deliveries not being independent events. Furthermore, any remaining dependence is likely to come from things that WASP, by construction, is designed to ignore: namely information about the quality of the particular batsmen and bowlers who will play out the remaining deliveries.
This brings me to your graph of actual versus WASPish first innings totals. You say that your model simulation was team agnostic (as WASP is designed to be) pitting average batsmen against average bowlers. I am guessing (but please correct me if I am wrong) that it also
assumed the game was played on the average pitch. The real-world data, in contrast had a whole variety of batting conditions and the full variety of team quality. That will automatically give a higher variance in first innings scores than the model.
thanks for the detailed comment!
Yes, I did assume that the game was also played on an “average pitch”. And I agree with you that once we take into account the pitch and the “conditions” (or some proxy of that), we can get more realistic distributions using a WASP-ish model.
Any model simulates means. Hence by default has thinner tails than reality.
If you want more representative for extremes need a separate model for right tail and left tail.