Conversation with an Afghan-Dutch taxi driver

We got back to London yesterday, and were welcomed with atypical London weather – thunderstorms. While it is common to stereotype London’s weather as being typically shitty and grey, it doesn’t normally rain all that heavily here – most of the rain that London gets is what is called “spitting rain” – slow drizzly rain best dealt with with a nice cap.

Also welcoming us was an Afghan-Dutch guy who drove us home in his Merc (we hired him through Uber). We got talking and there were a few interesting things from what he said that I though were Pertinent.

  • When we told him we were from Bangalore he said something that sounded like “cooley”. First we interpreted it as him saying that the city is cool, and then realised that wasn’t what he was saying. Then I thought he was talking about Coolie which was filmed in Bangalore, but it wasn’t that as well. Finally we realised he was talking about Virat Kohli, who plays for Royal Challengers Bangalore. It’s funny how Kohli is identified with Bangalore abroad though he’s only nominally based there only during the IPL season
  • We spoke a bit about the IPL and he said he was disappointed that “our team” lost. A minute later he said the team was Sunrisers Hyderabad. For a while it wasn’t clear as to why the Sunrisers were his team. Then I realised they have two prominent Afghan players – Rashid Khan and Mohammad Nabi.
  • He was studying to be a dentist, and decided to spend time in England learning English because a lot of the dental course was in English. Apart from putting himself through formal English classes, driving an Uber was a way for him to become better at English (it’s interesting how at times in our conversation he switched to using Hindi words – some of which I’m guessing are common to Pashto as well), apart from making money
  • My wife later told me that it was common for continental Europeans to spend a gap year in England learning English. And that apart from taking classes they take up jobs where they can practice the language – like driving a taxi or waiting tables.
  • The conversation also got me thinking about gap years and saving up for education – something that doesn’t at all happen in India. In India, the standard practice is to go to college immediately after school, when one is still being funded by parents. In one way, this reduces social mobility since people whose parents can’t afford college end up not studying. Also, the returns to education in India are high enough that the compensation for blue collar jobs (that one can find without a college degree) isn’t enough to fund a later degree.
  • Despite having Afghan parents, this guy has never been there. “It’s way too dangerous. I can go see relatives but will end up spending most time indoors, so not much fun”, he said.

Every time I have a conversation with a taxi driver I’m reminded of what I was told by a friend on the day I moved to Delhi in 2008. “It might be common in Bangalore to chat up auto and taxi drivers”, he had told me, “but in Delhi it is not the done thing”. I still wonder why.

A banker’s apology

Whenever there is a massive stock market crash, like the one in 1987, or the crisis in 2008, it is common for investment banking quants to talk about how it was a “1 in zillion years” event. This is on account of their models that typically assume that stock prices are lognormal, and that stock price movement is Markovian (today’s movement is uncorrelated with tomorrow’s).

In fact, a cursory look at recent data shows that what models show to be a one in zillion years event actually happens every few years, or decades. In other words, while quant models do pretty well in the average case, they have thin “tails” – they underestimate the likelihood of extreme events, leading to building up risk in the situation.

When I decided to end my (brief) career as an investment banking quant in 2011, I wanted to take the methods that I’d learnt into other industries. While “data science” might have become a thing in the intervening years, there is still a lot for conventional industry to learn from banking in terms of using maths for management decision-making. And this makes me believe I’m still in business.

And like my former colleagues in investment banking quant, I’m not immune to the fat tail problem as well – replicating solutions from one domain into another can replicate the problems as well.

For a while now I’ve been building what I think is a fairly innovative way to represent a cricket match. Basically you look at how the balance of play shifts as the game goes along. So the representation is a line graph that shows where the balance of play was at different points of time in the game.

This way, you have a visualisation that at one shot tells you how the game “flowed”. Consider, for example, last night’s game between Mumbai Indians and Chennai Super Kings. This is what the game looks like in my representation.

What this shows is that Mumbai Indians got a small advantage midway through the innings (after a short blast by Ishan Kishan), which they held through their innings. The game was steady for about 5 overs of the CSK chase, when some tight overs created pressure that resulted in Suresh Raina getting out.

Soon, Ambati Rayudu and MS Dhoni followed him to the pavilion, and MI were in control, with CSK losing 6 wickets in the course of 10 overs. When they lost Mark Wood in the 17th Over, Mumbai Indians were almost surely winners – my system reckoning that 48 to win in 21 balls was near-impossible.

And then Bravo got into the act, putting on 39 in 10 balls with Imran Tahir watching at the other end (including taking 20 off a Mitchell McClenaghan over, and 20 again off a Jasprit Bumrah over at the end of which Bravo got out). And then a one-legged Jadhav came, hobbled for 3 balls and then finished off the game.

Now, while the shape of the curve in the above curve is representative of what happened in the game, I think it went too close to the axes. 48 off 21 with 2 wickets in hand is not easy, but it’s not a 1% probability event (as my graph depicts).

And looking into my model, I realise I’ve made the familiar banker’s mistake – of assuming independence and Markovian property. I calculate the probability of a team winning using a method called “backward induction” (that I’d learnt during my time as an investment banking quant). It’s the same system that the WASP system to evaluate odds (invented by a few Kiwi scientists) uses, and as I’d pointed out in the past, WASP has the thin tails problem as well.

As Seamus Hogan, one of the inventors of WASP, had pointed out in a comment on that post, one way of solving this thin tails issue is to control for the pitch or  regime, and I’ve incorporated that as well (using a Bayesian system to “learn” the nature of the pitch as the game goes on). Yet, I see I struggle with fat tails.

I seriously need to find a way to take into account serial correlation into my models!

That said, I must say I’m fairly kicked about the system I’ve built. Do let me know what you think of this!

PM’s Eleven

The first time I ever heard of Davos was in 1997, when then Indian Prime Minister HD Deve Gowda attended the conference in the ski resort and gave a speech. He was heavily pilloried by the Kannada media, and given the moniker “Davos Gowda”.

Maybe because of all the attention Deve Gowda received for the trip, and not in a good way, no Indian Prime Minister ventured to go there for another twenty years. Until, of course, Narendra Modi went there earlier this week and gave a speech that apparently got widely appreciated in China.

There is another thing that connects Modi and Deve Gowda as Prime Ministers (leaving aside trivialties such as them being chief ministers of their respective states before becoming Prime Ministers).

Back in 1996 when Deve Gowda was Prime Minister, Rahul Dravid,  Venkatesh Prasad and Sunil Joshi made their Test debuts (on the tour of England). Anil Kumble and Javagal Srinath had long been fixtures in the Indian cricket team. Later that year, Sujith Somasunder played a couple of one dayers. David Johnson played two Tests. And in early 1997, Doddanarasaiah Ganesh played a few Test matches.

In case you haven’t yet figured out, all these cricketers came from Karnataka, the same state as the Prime Minister. During that season, it was normal for at least five players in the Indian Eleven to be from Karnataka. Since Deve Gowda had become Prime Minister around the same time, there was no surprise that the Indian cricket team was called “PM’s Eleven”. Coincidentally, the chairman of selectors at that point in time was Gundappa Vishwanath, who is also from Karnataka.

The Indian team playing in the current Test match in Johannesburg has four players from Gujarat. Now, this is not as noticeable as five players from Karnataka because Gujarat is home to three Ranji Trophy teams. Cheteshwar Pujara plays for Saurashtra, Parthiv Patel and Jasprit Bumrah play for Gujarat, and Hardik Pandya plays for Baroda. And Saurashtra’s Ravindra Jadeja is also part of the squad.

It had been a long time since once state had thus dominated the Indian cricket team. Perhaps we hadn’t seen this kind of domination since Karnataka had dominated in the late 1990s. And it so happens that once again the state dominating the Indian cricket team happens to be the Prime Minister’s home state.

So after a gap of twenty one years, we had an Indian Prime Minister addressing Davos. And after a gap of twenty one years, we have an Indian cricket team that can be called “PM’s Eleven”!

As Baada put it the other day, “Modi is the new Deve Gowda. Just without family and sleep”.

Update: I realised after posting that I have another post called “PM’s Eleven” on this blog. It was written in the UPA years.

Duckworth Lewis Book

Yesterday at the local council library, I came across this book called “Duckworth Lewis” written by Frank Duckworth and Tony Lewis (who “invented” the eponymous rain rule). While I’d never heard about the book, given my general interest in sports analytics I picked it up, and duly finished reading it by this morning.

The good thing about the book is that though it’s in some way a collective autobiography of Duckworth and Lewis, they restrict their usual life details to a minimum, and mostly focus on what they are famous for. There are occasions when they go into too much detail describing a trip to either Australia or the West Indies, but it’s easy to filter out such stuff and read the book for the rain rule.

Then again, it isn’t a great book. If you’re not interested in cricket analytics there isn’t that much for you to know from the book. But given that it’s a quick read, it doesn’t hurt so much! Anyway, here are some pertinent observations:

  1. Duckworth and Lewis didn’t get paid much for their method. They managed to get the ICC to accept their method sometime in the mid 90s, but it wasn’t until the early 2000s, by when Lewis had become a business school professor, that they managed to strike a financial deal with ICC. Even when they did, they make it sound like they didn’t make much money off it.
  2. The method came about when Duckworth quickly put together something for a statistics conference he was organising, where another speaker who was supposed to speak about cricket pulled out at the last minute. Lewis later came across the paper, and then got one of his undergrad students to do a project about it. The two men subsequently collaborated
  3. It’s amazing (not in a positive way) the kind of data that went into the method. Until the early 2000s, the only dataset that was used to calibrate the method was what was put together by Lewis’s undergrad. And this was mostly English County games, played over 40, 55 and 60 overs. Even after that, the frequency of updation with new data (which reflects new playing styles and strategies) is rather low.
  4. The system doesn’t seem to have been particularly well software engineered – it was initially simply coded up by Duckworth, and until as late as 2007 it ran on the DOS operating system. It was only in 2008 or so, when Steven Stern joined the team (now the method is called DLS to include his name), that a windows version was introduced.
  5. There is very little discussion of alternate methods, and though there is a chapter about it, Duckworth and Lewis are rather dismissive about them. For example, another popular method is by this guy called V Jayadevan from Thrissur. Here is some excellent analysis by Srinivas Bhogle where he compares the two methods. Duckworth and Lewis spend a couple of pages listing a couple of scenarios where Jayadevan’s method doesn’t work, and then spends a paragraph disparaging Bhogle for his support of the VJD method.
  6. This was the biggest takeaway from the book for me – the Duckworth Lewis method doesn’t equalise probabilities of victory of the two teams before and after the rain interruption. Instead, the method equalises the margin of victory between the teams before and after the break. So let’s say a team was 10 runs behind the DL “par score” when it rains. When the game restarts, the target is set such that the team is still 10 runs behind the par score! They make an attempt to explain why this is superior to equalising probabilities of winning  but don’t go too far with it.
  7. The adoption of Duckworth Lewis seems like a fairly random event. Following the World Cup 1992 debacle (when South Africa’s target went from 22 off 13 to 22 off 1 ball after a rain break), there was a demand for new rain rules. Duckworth and Lewis somehow managed to explain their method to the ECB secretary. And since it was superior to everything that was there then, it simply got adopted. And then it became incumbent, and became hard to dislodge!
  8. There is no mention in the book about the inherent unfairness of the DL method (in that it can be unfair to some playing styles).

Ok this is already turning out to be a long post, but one final takeaway is that there’s a fair amount of randomness in sports analytics, and you shouldn’t get into it if your only potential customer is a national sporting body. In that sense, developments such as the IPL are good for sports analytics!

Mike Hesson and cricket statistics

While a lot is made of the use of statistics in cricket, my broad view based on presentation of statistics in the media and the odd player/coach interview is that cricket hasn’t really learnt how to use statistics as it should. A lot of so-called insights are based on small samples, and coaches such as Peter Moores have been pilloried for their excess focus on data.

In this context, I found this interview with New Zealand coach Mike Hesson in ESPNCricinfo rather interesting. From my reading of the interview, he seems to “get” data and how to use it, and helps explain the general over-performance to expectations of the New Zealand cricket team in the last few years.

Some snippets:

You’re trying to look at trends rather than chuck a whole heap of numbers at players.

For example, if you look at someone like Shikhar Dhawan, against offspin, he’s struggled. But you’ve only really got a nine or ten-ball sample – so you’ve got to make a decision on whether it’s too small to be a pattern

Also, players take a little while to develop. You’re trying to select the player for what they are now, rather than what their stats suggest over a two or three-year period.

And there are times when you have to revise your score downwards. In our first World T20 match, in Nagpur, we knew it would slow up,


Go ahead and read the whole thing.

On cricket writing

This piece where Suveen Sinha of the Hindustan Times calls out Dhoni’s “joke” with respect to retirement has an interesting tailpiece:

When Dhoni was bantering with the Australian, the other journalists in the hall were laughing. They would, no sports journalist would want to be anything but nice to the formidable Indian captain. That’s why this piece had to be written by someone whose day job is to write on business and economy.

Looking at the reports of the incidents from both Sinha and EspnCricinfo’s standpoints, it is clear to me that Sinha’s view is more logical. That Dhoni’s calling of the journalist to the press conference table and cross-questioning him was unprofessional on the one hand and showed his lack of defences on the other.

Yet, the ending to Sinha’s piece also explains why other sports journalists have taken to lauding Dhoni’s view rather than critisicing him – for them, access to the Indian limited overs captain is important, and they wouldn’t like to damage that by taking an Australian colleague’s side.

The problem with a lot of sports journalism in general, and Indian cricket journalism in particular, is that jingoism and support for one’s team trumps objective reporting and analysis. One example of this was coverage from Indian and Australian newspapers of the Monkeygate scandal in 2007-08 (when Harbhajan Singh called Andrew Symonds a monkey).

More recently, there was the controversy about India losing games because of the tendency of Rohit Sharma (and Indian batsmen in general) to slow down in their 90s. Again, commentary about that took jingoistic tones, with the Indian sports media coming out strongly in favour of Sharma. There were reports defending his “commitment” and “grit” and all such flowery language sports journalists love, and that Glenn Maxwell’s comment was entirely unwarranted. Maxwell even backed down on his comments.

Data, however, showed that Maxwell need not have backed down on his comments. Some analysis based on ball-by-ball data that I published in Mint showed clearly that Indian batsmen do slow down in their 90s, and of all recent players, Sharma was the biggest culprit.

Indian batsmen slowing down in their 90s. My analysis for Mint
Rohit Sharma is among the biggest culprits in terms of slowing down in the 90s

The piece was a hit and was widely shared on social media. What was more interesting, however, was the patterns in which it was shared. For one, the editors at Mint loved it and shared it widely. It was also shared widely by mango people and people with a general interest in cricket.

The class of people which was conspicuous by its absence of commentary on my piece was sports journalists. While it could be reasoned that they didn’t see the piece (appearing as it did in a business publication, though I did send emails to some of them), my reasoning is that this piece didn’t gain much traction among them because it didn’t fit their priors, and didn’t fit the jingoistic narrative they had been building.

It is not necessary, though, that someone only shares pieces that they completely agree with – it is a fairly common practice to share (and abuse) pieces which you vehemently disagree with. The commentary I found about this piece was broadly positive – few people who had shared the piece disagreed with it.

My (untested) hypothesis on this is that this analysis flew in the face of all that mainstream sports journalists had been defending over the previous few days – that Maxwell’s comments were simply not true, or that Sharma was a committed cricketer, and all such hyperbole. With data being harder to refute (only option being to poke holes in the analysis, but this analysis was rather straightforward), they chose to not give it further publicity.

Of course, I might be taking too much credit here, but that doesn’t take away from the fact that there is a problem with sports (and more specifically, cricket) writing. Oh, and as for the ultra-flowery language, I’ll save my comments for another day and another post.



Super-specialisation in cricket

Cricket has always been a reasonably specialised sport. You are either a batsman or a bowler or a wicketkeeper or an all-rounder. If you’re a bowler, you’re classified based on your bowling arm and the speed at which you bowl and the spin you impart the ball (last two are not independent). If you’re a batsman you’re classified based on your batting stance and whether you’re an opener or a middle-order batsman.

In Test cricket, there’s further specialisation if you’re a middle-order batsman. You have specialist Number Threes, like Rahul Dravid or Ricky Ponting. You have specialist Number Fours, like Sachin Tendulkar or Younis Khan. Five and six are fungible, but a required ability for both these positions is the ability to bat with the tail.

In One Day cricket, too, there’s some degree of specialisation within the middle order but it’s not to the same extent as in Test Cricket. In One Day cricket, batting orders are more flexible and situation-based. You do have specialist threes (Dravid and Ponting again come to mind) and sixes (usually hitters) but the super-specialisation is not as much as in Test Cricket.

A logical extension of this would be that in T20 cricket, which is played over an even shorter duration and where batting orders are even more flexible, you don’t need even as much of specialisation as in ODIs. However, Siddharth Monga argues in this piece that this lack of specialisation is why India isn’t doing as well as it could in T20s (having just lost the home series to South Africa).

In other words, what Monga is arguing is that Kohli, Raina and Sharma are all similar batsmen and effectively Number Threes for their IPL franchises, and when they are arranged 2-4 or 3-5 in the Indian national team, two of them are effectively batting out of position.

It would be interesting if Monga is indeed right and that T20s require a higher degree of specialisation than ODIs. It is also interesting that India’s number 6, MS Dhoni, bats like a typical number 5 in T20s, accumulating for a while before going bonkers. Maybe T20 will end up as a much more specialised sport than Tests? That would be interesting to watch.