## Record of my publicly available work

A few people who I’ve spoken to as part of my job hunt have asked to see some “detailed descriptions” of work that I’ve done. The other day, I put together an email with some of these descriptions. I thought it might make sense to “document” it in one place (and for me, the “obvious one place” is this blog). So here it is. As you might notice, this takes the form of an email.

I’m putting together links to some of the publicly available work that i’ve done.
1. Cricket
I have a model to evaluate and “tell the story of a cricket match”. This works for all limited overs games, and is based on a dynamic programming algorithm similar to the WASP. The basic idea is to estimate the odds of each team winning at the end of each ball, and then chart that out to come up with a “match story”.
And through some simple rules-based intelligence, the key periods in the game are marked out.
The model can also be used to evaluate the contributions of individual batsmen and bowlers towards their teams’ cause, and when aggregated across games and seasons, can be used to evaluate players’ overall contributions.
Here is a video where I explain the model and how to interpret it:
The algorithm runs live during a game. You can evaluate the latest T20 game here:
Here is a more interactive version , including a larger selection of matches going back in time.
Related to this is a cricket analytics newsletter I actively wrote during the World Cup last year. Most Indians might find this post from the newsletter interesting:
2. Covid-19
At the beginning of the pandemic (when we had just gone under a national lockdown), I had built a few agent based models to evaluate the risk associated with different kinds of commercial activities. They are described here.
Every morning, a script that I have written parses the day’s data from covid19india.org and puts out some graphs to my twitter account  This is a daily fully automated feature.
Here is another agent based model that I had built to model the impact of social distancing on covid-19.
tweetstorm based on Bayes Theorem that I wrote during the pandemic went viral enough that I got invited to a prime time news show (I didn’t go).
3. Visualisations
I used to collect bad visualisations.
I also briefly wrote a newsletter analysing “good and bad visualisations”.
4. I have an “app” to predict which single malts you might like based on your existing likes. This blogpost explains the process behind (a predecessor of ) this model.
5. I had some fun with machine learning, using different techniques to see how they perform in terms of predicting different kinds of simple patterns.
6. I used to write a newsletter on “the art of data science”.
In addition to this, you can find my articles for Mint here. Also, this page on my website  as links to some anonymised case studies.

I guess that’s a lot? In any case, now I’m wondering if I did the right thing by choosing “skthewimp” as my Github username.

## Finite and infinite cricket games

I’ve written about James Carse’s Finite and Infinite Games here before. It is among the more influential books I’ve read, though it’s a bit of a weirdly written book, almost in a constant staccato tone.

From one of my previous posts:

One of the most influential books I’ve read is James Carse’s Finite and Infinite Games. Finite Games are artificial games where we play to “win”. There is a defined finish, and there is a set of tasks that we need to achieve that constitutes “victory”. Most real-life games are on the other hand are “infinite games” where the objective is to simply ensure that the game simply goes on.

I’ve spent most of this evening watching The Test, the Amazon Prime documentary about the Australian cricket team after Sandpapergate. It’s a good half-watch. Parts of it demand a lot of attention, but overall it’s a nice “background watch” while I’m doing something else.

In any case, the reason for writing the post is this little interview of Harsha Bhogle somewhere in the middle of this documentary (he has appeared several times more after this one). In this bit, he talks about how in Test cricket, the opponent might be having a good time for a while, but it is okay to permit him that. To paraphrase Gully Boy, “apna time aayega” – the bowler or batsman in question will tire or diminish after some time, after which you can do your business.

He went on to say that this is not the case in limited overs cricket (ODIs and T20s) where both batsmen and bowlers need to constantly look to dominate, and cannot simply look to “survive” when an opponent is on the roll.

While Test cricket is strictly not an “infinite game” (it needs to end in five days), I thought this was a beautiful illustration of the concept of finite and infinite games. The objective of an infinite game, as James Carse describes in his book, is to just continue to play the game.

As a batsman in Test cricket, you look to just be there, weather out the good spells and spend time at the crease. You do this and the runs will come (it is analogous for bowlers – you need to bowl well enough to continue to be in the game, and then when the time comes you will get your rewards).

In ODIs and T20s, you cannot bide your time. Irrespective of how the opponent is playing, you need to “win every moment”, which is the premise for a finite game.

Now, I don’t know what I’m getting at here, and what he point of this post is, but I think I just liked Harsha Bhogle’s characterisation of Tests as infinite games, and wanted to share that with you.

## Gully Cricket With A Test Cricketer

Long, long ago, I’d written a post comparing gully cricket with baseball. This was based on my experience playing cricket in school, on roads next to friends’ houses, in the gap between my house and the next, and even the gap between rows of desks in my school classroom.

I hadn’t imagined all this gully cricket experience to come in useful in any manner. Until a few weeks back when Siddhartha Vaidyanathan asked me to join him in this episode of “81 all out” podcast. The “main guest” on this show was Test cricketer Vijay Bharadwaj, whose Test debut, you might remember, ended in “83 all out“.

It was a fascinating conversation, and I loved being part of it. I realised that the sort of gully cricket I played was nothing like the sort that Vijay played. As I mention in the podcast, I “never graduated from the road to the field”.

Unfortunately I wasn’t able to put my fundaes on baseball, and other theories I’ve concocted about Gully Cricket. Nevertheless, I had fun recording this, and I think you’ll have fun listening to it as well. You can listen to it here, or on any of your usual podcast tools (search for “81 all out”).

## Amazon and Sony Liv

Amazon is pretty bad at design of products they’re not pioneers in. They’ve built a great shopping engine (25 years ago) and a great cloud service (15 years ago), but these were both things they were pioneers in.

Amazon being Amazon, however, they have a compulsive need to be in pretty much every industry, and so they’ve launched clones of lots of other businesses. However, their product design in these is far from optimal, and the user experience is generally very underwhelming.

Prime Video has a worse user experience than Netflix. The search function is much worse. The machine learning (for recommendations) isn’t great. The X-ray is good, but overall I don’t have as pleasant a time watching Prime as I do with Netflix.

However, the degree to which Prime Video is worse than Netflix is far far smaller than the degree to which Amazon Music is worse than Spotify. The only thing going for Amazon Music (which I only use because it comes free with my prime delivery membership in India) is that they have inventory.

Spotify in India has been unable to secure rights to a lot of classic rock and metal bands, such as Iron Maiden and Black Sabbath and Led Zeppelin and Dream Theater. And these form a heavy part of my routine listening. And so I’m forced to use Amazon Music (Apple Music has these bands as well, but I have to pay extra for that).

The product (Amazon Music) is atrocious. The learning is next to nothing. After five months of using the service to exclusively listen to Classic Rock and Heavy Metal, and zero Indian music, the home page still recommends to me Bollywood, Punjabi and Tamil stuff! History is not properly maintained. Getting to the album or playlist (the less said about playlists on Amazon, the better) I want takes way too much more effort than it does on Spotify.

In other words, the only thing that keeps Amazon going in businesses they’re not pioneers in is inventory – Prime Video works because it has movies and shows other streaming services don’t have. Amazon Music is used because it has music that Spotify doesn’t.

I figured it is a similar case with Sony Liv, Sony’s streaming service in India. They sit on a bunch of lucrative monopolies, such as rights to broadcasting Test cricket in a lot of countries (all three Test series being played right now are on Sony, for example), Champions League football and so on. Beyond that it’s an atrocity to watch them.

I remember missing a goal in the Liverpool-Porto Champions League quarterfinal because of a temporary power cut. There was no way in the broadcast to go back and see the goal. If I by mistake pause for a couple of seconds, I’m forever behind “live” (unless I refresh). Yesterday during the classic Ashes Test, the app simply gave up when I tried to load the game.

The product is atrocious (actually more atrocious than Amazon Music), but people are forced to use it only because they have a monopoly on content. And in that way, it is similar to Amazon, which can get away with atrocious products only because they have the inventory!

I’m glad the Premier League is on Hotstar, which is mostly a pleasure to watch! (actually back in the day when I had cable TV, the star sports bouquet had significantly superior production values to the sony-zee-ten bouquet)

## Good vodka and bad chicken

When I studied Artificial Intelligence, back in 2002, neural networks weren’t a thing. The limited compute capacity and storage available at that point in time meant that most artificial intelligence consisted of what is called “rule based methods”.

And as part of the course we learnt about machine translation, and the difficulty of getting the implicit meaning across. The favourite example by computer scientists in that time was the story of how some scientists translated “the spirit is willing but the flesh is weak” into Russian using an English-Russian translation software, and then converted it back into English using a Russian-English translation software.

The result was “the vodka is excellent but the chicken is not good”.

While this joke may not be valid any more thanks to the advances in machine translation, aided by big data and neural networks, the issue of translation is useful in other contexts.

Firstly, speaking in a language that is not your “technical first language” makes you eschew jargon. If you have been struggling to get rid of jargon from your professional vocabulary, one way to get around it is to speak more in your native language (which, if you’re Indian, is unlikely to be your technical first language). Devoid of the idioms and acronyms that you normally fill your official conversation with, you are forced to think, and this practice of talking technical stuff in a non-usual language will help you cut your jargon.

There is another use case for using non-standard languages – dealing with extremely verbose prose. A number of commentators, a large number of whom are rather well-reputed, have this habit of filling their columns with flowery language, GRE words, repetition and rhetoric. While there is usually some useful content in these columns, it gets lost in the language and idioms and other things that would make the columnist’s high school English teacher happy.

I suggest that these columns be given the spirit-flesh treatment. Translate them into a non-English language, get rid of redundancies in sentences and then  translate them back into English. This process, if the translators are good at producing simple language, will remove the bluster and make the column much more readable.

Speaking in a non-standard language can also make you get out of your comfort zone and think harder. Earlier this week, I spent two hours recording a podcast in Hindi on cricket analytics. My Hindi is so bad that I usually think in Kannada or English and then translate the sentence “live” in my head. And as you can hear, I sometimes struggle for words. Anyway here is the thing. Listen to this if you can bear to hear my Hindi for over an hour.

## Vlogging!

The first seed was sown in my head by Harish “the Psycho” J, who told me a few months back that nobody reads blogs any more, and I should start making “analytics videos” to increase my reach and hopefully hit a new kind of audience with my work.

While the idea was great, I wasn’t sure for a long time what videos I could make. After all, I’m not the most technical guy around, and I had no patience for making videos on “how to use regression” and stuff like that. I needed a topic that would be both potentially catchy and something where I could add value. So the idea remained an idea.

For the last four or five years, my most common lunchtime activity has been to watch chess videos. I subscribe to the Youtube channels of Daniel King and Agadmator, and most days when I eat lunch alone at home are spent watching their analyses of games. Usually this routine gets disrupted on Fridays when the wife works from home (she positively hates these videos), but one Friday a couple of months back I decided to ignore her anyway and watch the videos (she was in her room working).

She had come out to serve herself to another serving of whatever she had made that day and saw me watching the videos. And suddenly asked me why I couldn’t make such videos as well. She has seen me work over the last seven years to build what I think is a fairly cool cricket visualisation, and said that I should use it to make little videos analysing cricket matches.

And since then my constant “background process” has been to prepare for these videos. Earlier, Stephen Rushe of Cricsheet used to unfailingly upload ball by ball data of all cricket matches as soon as they were done. However, two years back he went into “maintenance mode” and has stopped updating the data. And so I needed a method to get data as well.

Here, I must acknowledge the contributions of Joe Harris of White Ball Analytics, who not only showed me the APIs to get ball by ball data of cricket matches, but also gave very helpful inputs on how to make the visualisation more intuitive, and palatable to the normal cricket fan who hasn’t seen such a thing before. Joe has his own win probability model based on ball by ball data, which I think is possibly superior to mine in a lot of scenarios (my model does badly in high-scoring run chases), though I’ve continued to use my own model.

So finally the data is ready, and I have a much improved visualisation to what I had during the IPL last year, and I’ve created what I think is a nice app using the Shiny package that you can check out for yourself here. This covers all T20 international games, and you can use the app to see the “story of each game”.

And this is where the vlogging comes in – in order to explain how the model works and how to use it, I’ve created a short video. You can watch it here:

While I still have a long way to go in terms of my delivery, you can see that the video has come out rather well. There are no sync issues, and you see my face also in one corner. This was possible due to my school friend Sunil Kowlgi‘s Outklip app. It’s a pretty easy to use Chrome app, and the videos are immediately available on the platform. There is quick YouTube integration as well, for you to upload them.

And this is not a one time effort – going forward I’ll be making videos of limited overs games analysing them using my app, and posting them on my Youtube channel (or maybe I’ll make a new channel for these videos. I’ll keep you updated). I hope to become a regular Vlogger!

So in the meantime, watch the above video. And give my app a spin. Soon I’ll be releasing versions covering One Day Internationals and franchise T20s as well.

## Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it.  And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided  test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).

## Conversation with an Afghan-Dutch taxi driver

We got back to London yesterday, and were welcomed with atypical London weather – thunderstorms. While it is common to stereotype London’s weather as being typically shitty and grey, it doesn’t normally rain all that heavily here – most of the rain that London gets is what is called “spitting rain” – slow drizzly rain best dealt with with a nice cap.

Also welcoming us was an Afghan-Dutch guy who drove us home in his Merc (we hired him through Uber). We got talking and there were a few interesting things from what he said that I though were Pertinent.

• When we told him we were from Bangalore he said something that sounded like “cooley”. First we interpreted it as him saying that the city is cool, and then realised that wasn’t what he was saying. Then I thought he was talking about Coolie which was filmed in Bangalore, but it wasn’t that as well. Finally we realised he was talking about Virat Kohli, who plays for Royal Challengers Bangalore. It’s funny how Kohli is identified with Bangalore abroad though he’s only nominally based there only during the IPL season
• We spoke a bit about the IPL and he said he was disappointed that “our team” lost. A minute later he said the team was Sunrisers Hyderabad. For a while it wasn’t clear as to why the Sunrisers were his team. Then I realised they have two prominent Afghan players – Rashid Khan and Mohammad Nabi.
• He was studying to be a dentist, and decided to spend time in England learning English because a lot of the dental course was in English. Apart from putting himself through formal English classes, driving an Uber was a way for him to become better at English (it’s interesting how at times in our conversation he switched to using Hindi words – some of which I’m guessing are common to Pashto as well), apart from making money
• My wife later told me that it was common for continental Europeans to spend a gap year in England learning English. And that apart from taking classes they take up jobs where they can practice the language – like driving a taxi or waiting tables.
• The conversation also got me thinking about gap years and saving up for education – something that doesn’t at all happen in India. In India, the standard practice is to go to college immediately after school, when one is still being funded by parents. In one way, this reduces social mobility since people whose parents can’t afford college end up not studying. Also, the returns to education in India are high enough that the compensation for blue collar jobs (that one can find without a college degree) isn’t enough to fund a later degree.
• Despite having Afghan parents, this guy has never been there. “It’s way too dangerous. I can go see relatives but will end up spending most time indoors, so not much fun”, he said.

Every time I have a conversation with a taxi driver I’m reminded of what I was told by a friend on the day I moved to Delhi in 2008. “It might be common in Bangalore to chat up auto and taxi drivers”, he had told me, “but in Delhi it is not the done thing”. I still wonder why.

## A banker’s apology

Whenever there is a massive stock market crash, like the one in 1987, or the crisis in 2008, it is common for investment banking quants to talk about how it was a “1 in zillion years” event. This is on account of their models that typically assume that stock prices are lognormal, and that stock price movement is Markovian (today’s movement is uncorrelated with tomorrow’s).

In fact, a cursory look at recent data shows that what models show to be a one in zillion years event actually happens every few years, or decades. In other words, while quant models do pretty well in the average case, they have thin “tails” – they underestimate the likelihood of extreme events, leading to building up risk in the situation.

When I decided to end my (brief) career as an investment banking quant in 2011, I wanted to take the methods that I’d learnt into other industries. While “data science” might have become a thing in the intervening years, there is still a lot for conventional industry to learn from banking in terms of using maths for management decision-making. And this makes me believe I’m still in business.

And like my former colleagues in investment banking quant, I’m not immune to the fat tail problem as well – replicating solutions from one domain into another can replicate the problems as well.

For a while now I’ve been building what I think is a fairly innovative way to represent a cricket match. Basically you look at how the balance of play shifts as the game goes along. So the representation is a line graph that shows where the balance of play was at different points of time in the game.

This way, you have a visualisation that at one shot tells you how the game “flowed”. Consider, for example, last night’s game between Mumbai Indians and Chennai Super Kings. This is what the game looks like in my representation.

What this shows is that Mumbai Indians got a small advantage midway through the innings (after a short blast by Ishan Kishan), which they held through their innings. The game was steady for about 5 overs of the CSK chase, when some tight overs created pressure that resulted in Suresh Raina getting out.

Soon, Ambati Rayudu and MS Dhoni followed him to the pavilion, and MI were in control, with CSK losing 6 wickets in the course of 10 overs. When they lost Mark Wood in the 17th Over, Mumbai Indians were almost surely winners – my system reckoning that 48 to win in 21 balls was near-impossible.

And then Bravo got into the act, putting on 39 in 10 balls with Imran Tahir watching at the other end (including taking 20 off a Mitchell McClenaghan over, and 20 again off a Jasprit Bumrah over at the end of which Bravo got out). And then a one-legged Jadhav came, hobbled for 3 balls and then finished off the game.

Now, while the shape of the curve in the above curve is representative of what happened in the game, I think it went too close to the axes. 48 off 21 with 2 wickets in hand is not easy, but it’s not a 1% probability event (as my graph depicts).

And looking into my model, I realise I’ve made the familiar banker’s mistake – of assuming independence and Markovian property. I calculate the probability of a team winning using a method called “backward induction” (that I’d learnt during my time as an investment banking quant). It’s the same system that the WASP system to evaluate odds (invented by a few Kiwi scientists) uses, and as I’d pointed out in the past, WASP has the thin tails problem as well.

As Seamus Hogan, one of the inventors of WASP, had pointed out in a comment on that post, one way of solving this thin tails issue is to control for the pitch or  regime, and I’ve incorporated that as well (using a Bayesian system to “learn” the nature of the pitch as the game goes on). Yet, I see I struggle with fat tails.

I seriously need to find a way to take into account serial correlation into my models!

That said, I must say I’m fairly kicked about the system I’ve built. Do let me know what you think of this!

## PM’s Eleven

The first time I ever heard of Davos was in 1997, when then Indian Prime Minister HD Deve Gowda attended the conference in the ski resort and gave a speech. He was heavily pilloried by the Kannada media, and given the moniker “Davos Gowda”.

Maybe because of all the attention Deve Gowda received for the trip, and not in a good way, no Indian Prime Minister ventured to go there for another twenty years. Until, of course, Narendra Modi went there earlier this week and gave a speech that apparently got widely appreciated in China.

There is another thing that connects Modi and Deve Gowda as Prime Ministers (leaving aside trivialties such as them being chief ministers of their respective states before becoming Prime Ministers).

Back in 1996 when Deve Gowda was Prime Minister, Rahul Dravid,  Venkatesh Prasad and Sunil Joshi made their Test debuts (on the tour of England). Anil Kumble and Javagal Srinath had long been fixtures in the Indian cricket team. Later that year, Sujith Somasunder played a couple of one dayers. David Johnson played two Tests. And in early 1997, Doddanarasaiah Ganesh played a few Test matches.

In case you haven’t yet figured out, all these cricketers came from Karnataka, the same state as the Prime Minister. During that season, it was normal for at least five players in the Indian Eleven to be from Karnataka. Since Deve Gowda had become Prime Minister around the same time, there was no surprise that the Indian cricket team was called “PM’s Eleven”. Coincidentally, the chairman of selectors at that point in time was Gundappa Vishwanath, who is also from Karnataka.

The Indian team playing in the current Test match in Johannesburg has four players from Gujarat. Now, this is not as noticeable as five players from Karnataka because Gujarat is home to three Ranji Trophy teams. Cheteshwar Pujara plays for Saurashtra, Parthiv Patel and Jasprit Bumrah play for Gujarat, and Hardik Pandya plays for Baroda. And Saurashtra’s Ravindra Jadeja is also part of the squad.

It had been a long time since once state had thus dominated the Indian cricket team. Perhaps we hadn’t seen this kind of domination since Karnataka had dominated in the late 1990s. And it so happens that once again the state dominating the Indian cricket team happens to be the Prime Minister’s home state.

So after a gap of twenty one years, we had an Indian Prime Minister addressing Davos. And after a gap of twenty one years, we have an Indian cricket team that can be called “PM’s Eleven”!

As Baada put it the other day, “Modi is the new Deve Gowda. Just without family and sleep”.

Update: I realised after posting that I have another post called “PM’s Eleven” on this blog. It was written in the UPA years.