Re-gifting for rebel girls

On the occasion of her birthday a couple of months back, the daughter received a copy of “Goodnight stories for rebel girls“. As you might have guessed, this was not the first time she had received this book – we had bought it for her earlier this year.

Interestingly, barely a week or two before, some friends who were visiting had gifted her “goodnight stories for rebel girls – 2“, saying that they were pretty confident that she will have this first book already.

In any case, once you have a second copy of a book, the honourable thing to do is to gift it to someone. So the copy of goodnight stories (1) that was received for the daughter’s birthday immediately went into our “regifting cupboard”. It continues to sit there.

The question is who to regift it to. It is a fair hypothesis that most girls will already have the book. So regifting it to another girl will only send it on an endless orbit of regifting. The logical corollary is that we need to regift it to a boy.

Here is where it gets a bit complicated. We’re pretty sure that any boy being brought up by a “feminist mother” (or “feminist parents”) will already have a copy of the book. And if we were to gift the book to a boy whose parents aren’t feminist, we’ll only end up pissing off the boy’s parents.

Who said giving a gift is easy?

Number fourteen

I killed another rat this morning. The fourteenth of my life. This came six years after my thirteenth. And it was also the hardest, forcing me to take the help of “unnatural support” to trap and kill it.

Unfortunately this was the best picture we could get, since I had decided to close the sticky mat after killing the rat on it

I first noticed the rat on Monday night when I was talking to a friend. I had stepped out of the bedroom to take the call, and we were barely done with pleasantaries when I said, “shit, there is a rat in my house”. “Oops, do you need to go? Are you scared?”, he asked. “Not scared, but I need to kill it”, I said, and ran.

As it happened, I was wearing my AirPods, and I ended up running too far away from my phone, and the call got cut. I had seen the rat going under the dining table, and then into the kitchen. By the time I fetched a stick broom (the one usually used to sweep outdoors – they are excellent for killing rats – see my left hand in the picture above), the rat had disappeared.

The fundamental principle of killing rats is to isolate it in one room, that is preferably “open” (without too many nooks and corners where it can hide). Our living room in this house is especially unsuited for this purpose since it has too many orifices, and many of these orifices can’t be shut.

In any case, I saw the rat hiding inside the back of the refrigerator. The idea was to move the refrigerator and whack it as soon as it ran out. Unfortunately, with my reflexes not being what they used to be, I wasn’t able to whack it adequately and the rat ran into my daughter’s room (she doesn’t sleep there yet).

This was both a good and a bad thing. The good thing was that the rat could be isolated inside this room. The bad thing was that there are too many things in this room, making it impossible to trap a rat there. I tried anyway, with a broom and a stick, for twenty minutes before giving up and calling back my friend.

Yesterday was an attritional battle. We woke up to the sight of the rat having tried to gnaw at the room door. It was nowhere to be found, though. I went to a nearby shop and got some rat poison (in the form of “cakes”), and for good measure also got a rat sticky board.

Representative image of a rat glue trap

I left some old potato chips in the middle of this pad, and spread the poison cakes throughout the room. Every two or three hours through yesterday I kept going in to check if the rat had eaten the cakes or otherwise been trapped. There was no luck.

This morning there was evidence once again that the rat had tried to gnaw the bottom of the room door. It was time for more proactive measures. The first step was to empty out the room. The amount of stuff (toys, dolls, games, etc.) that my daughter has is insane. Having made a mental note to “Marie Kondo her stuff” later today, I went on to finding the rat.

Despite mostly emptying the room, the rat was nowhere to be found. This reminded me of computer programming. Sometimes you know there is a bug, but you just aren’t able to find it. Finally, after more than an hour of search, I found that the rat had made itself cosy in the window curtains.

In computer programming, once you’ve found a bug, fixing it is relatively easy. With physical rodents, it’s not so straightforward. The rat started giving me a hard time.

Out (of the room) came the curtains. Out (of the room) came these boxes in which my daughter stores her toys. It was to no avail, as the rat cleverly used the mattress as a shield (irrespective of how I placed the mattress – horizontal / vertical / whatever). Finally, having made sure that the rat wasn’t in the mattress, that was pushed out of the room as well.

In general, catching a rat needs two people. One person prods from one side and the other person whacks from the other. My first ever experience of killing a rat (it’s counted in the 14) came as an assistant to my father, who had handed me a cricket bat when a rat had dared to come to our bathroom.

On subsequent occasions, I’ve used my aunt, my aunt’s housekeeper, my mother-in-law and others as my assistants. Today, there was no such help coming. My wife was too scared, and she had convinced the daughter as well that rats can be scary, so I was left to my own devices.

And it was my devices – one that I had purchased yesterday, to be precise – that came of use. I had noticed that the rat kept running under a chest of drawers every time I attacked it. So I strategically left the sticky mat under the chest of drawers, and kept chasing the rat under it. And one time, it stuck.

A couple of whacks with the broom finished it off. “Fourteen”, I shouted. I admit I sort of “cheated”, by using “unnatural aids” (the sticky mat) in this process. In my defence, I didn’t have any human support so was forced to use this.

Start the game already!

69 is the answer

The IDFC-Duke-Chicago survey that concluded that 50% of Bangalore had covid-19 in late June only surveyed 69 people in the city. 

When it comes to most things in life, the answer is 42. However, if you are trying to rationalise the IDFC-Duke-Chicago survey that found that over 50% of people in Bangalore had had covid-19 by end-June, then the answer is not 42. It is 69.

For that is the sample size that the survey used in Bangalore.

Initially I had missed this as well. However, this evening I attended half of a webinar where some of the authors of the survey spoke about the survey and the paper, and there they let the penny drop. And then I found – it’s in one small table in the paper.

The IDFC-Duke-Chicago survey only surveyed 69 people in Bangalore

The above is the table in its glorious full size. It takes effort to read the numbers. Look at the second last line. In Bangalore Urban, the ELISA results (for antibodies) were available for only 69 people.

And if you look at the appendix, you find that 52.5% of respondents in Bangalore had antibodies to covid-19 (that is 36 people). So in late June, they surveyed 69 people and found that 36 had antibodies for covid-19. That’s it.

To their credit, they didn’t highlight this result (I sort of dug through their paper to find these numbers and call the survey into question). And they mentioned in tonight’s webinar as well that their objective was to get an idea of the prevalence in the state, and not just in one particular region (even if it be as important as Bangalore).

That said, two things that they said during the webinar in defence of the paper that I thought I should point out here.

First, Anu Acharya of MapMyGenome (also a co-author of the survey) said “people have said that a lot of people we approached refused consent to be surveyed. That’s a standard of all surveying”. That’s absolutely correct. In any random survey, you will always have an implicit bias because the sort of people who will refuse to get surveyed will show a pattern.

However, in this particular case, the point to note is the extremely high number of people who refused to be surveyed – over half the households in the panel refused to be surveyed, and in a further quarter of the panel households, the identified person refused to be surveyed (despite the family giving clearance).

One of the things with covid-19 in India is that in the early days of the pandemic, anyone found having the disease would be force-hospitalised. I had said back then (not sure where) that hospitalising asymptomatic people was similar to the “precogs” in Minority Report – you confine the people because they MIGHT INFECT OTHERS.

For this reason, people didn’t want to get tested for covid-19. If you accidentally tested positive, you would be institutionalised for a week or two (and be made to pay for it, if you demanded a private hospital). Rather, unless you had clear symptoms or were ill, you were afraid of being tested for covid-19 (whether RT-PCR or antibodies, a “representative sample” won’t understand).

However, if you had already got covid-19 and “served your sentence”, you would be far less likely to be “afraid of being tested”. This, in conjunction with the rather high proportion of the panel that refused to get tested, suggests that there was a clear bias in the sample. And since the numbers for Bangalore clearly don’t make sense, it lends credence to the sampling bias.

And sample size apart, there is nothing Bangalore-specific about this bias (apart from that in some parts of the state, the survey happened after people had sort of lost their fear of testing). This further suggests that overall state numbers are also an overestimate (which fits in with my conclusion in the previous blogpost).

The other thing that was mentioned in the webinar that sort of cracked me up was the reason why the sample size was so low in Bangalore – a lockdown got announced while the survey was on, and the sampling team fled. In today’s webinar, the paper authors went off on a rant about how surveying should be classified as an “essential activity”.

In any case, none of this matters. All that matters is that 69 is the answer.

 

Join a boss or join a company?

“You don’t quit your job. You quit your boss”.

Versions of this keep popping up on my LinkedIn with amazing regularity. People have told me this in a non-ironic way in personal conversations as well, so I assume that it is true.

And now that I’m back in the job market, I’ve been thinking of a corollary to this – basically, if you apply “backward induction” to the above statement, then it essentially means that you “join a boss” rather than “join a company”?

I mean – if the boss is the reason why you quit a particular job, then shouldn’t you be thinking about this at the time when you’re joining as well? And so, while you’re interviewing and having these conversations, shouldn’t you be on the lookout for potential bad bosses as well?

In that sense, as I go through my hunt, I’ve been evaluating companies not just on the basis of what they do and what they might expect me to do, but also on the basis of what I feel about the people I talk to. In some places, I have an idea on who I could potentially report to, and in some I don’t. However, I treat pretty much everyone I talk to as people I have to potentially report to or work with at some point of time or the other, and evaluate the company based on these conversations.

Sometimes I think this might be too conservative, but at other times I think that this conservatism now is worth any potential trouble later.

What do you think about this approach?

Upgrade effect in action

So the workflow goes like this. Sometime a week to 10 days back, I read about the “upgrade effect“. It has to do with why people upgrade their iPhones every 1-2 years even though an iPhone is designed to last much longer (mine is 5 years old and going strong).

The theory is that once you know an “exciting upgrade” is available, you start becoming careless with your device. And then when the device suffers a small amount of damage, you seize the chance to upgrade.

I’m typing this on a MacBook Pro that is 6 years old. It is one of the last “old Macbooks” with the “good keyboard” (the one with keys that travel. I’ve forgotten if this is “butterfly” or “scissor”).

With consistently bad feedback about the other keyboard (the one where keys didn’t travel), I was very concerned about having to replace my Mac. And so I took extra good care of it. Though, this is what the keyboard has come to look like.

Last year I dropped a cup of milk tea on it, and panicked. Two days of drying it out helped, and the computer continued to work as it did (though around the same time the battery life dropped). Last year Apple reintroduced the old keyboard (with keys that travel), and I made a mental note to get a new laptop presently.

However, with this year having been locked down, battery life has ceased to be a problem for me (I don’t have to work in cafes or other places without charging points any more). And so I have soldiered on with my old Mac. And I’ve continued to be happy with it (I continue to be happy with my iPhone 6S as well).

And then on Wednesday I saw the announcement of the new M1 chip in the new Macbook Pro, with much enhanced battery and performance. I got really excited and thought this is a good time to upgrade my computer. And that I will “presently do it”.

I don’t know if I had the article about the “upgrade effect” but the same afternoon, sitting with my laptop on my lap and watching TV at the same time, I dropped it (I forget how exactly that happened. I was juggling multiple things and my daughter, and the computer dropped). I dropped it right on the screen.

Immediately it seemed fine. However, since yesterday, some black bands have appeared on the screen. Thankfully this is at one edge so it doesn’t affect “regular work”  (though last 3-4 months I’ve been using an external monitor at home). Yet, now I have a good reason to replace my laptop sooner than usual..

Based on the reviews so far (all of them have come before the actual hardware has shipped), I’m excited about finally upgrading my Mac. And this computer will then get donated to my daughter (she has figured out to type even on a keyboard that looks like the above).

I hadn’t imagined that soon after learning about the “upgrade effect” I would fall for it. Woresht.

Is handwriting hereditary?

I don’t know the answer to that question. However, I have a theory on how handwriting passes on down the generations.

So my daughter goes to a montessori. There they don’t teach them to read and write at a very early age (I could read by the time I was 2.5, but she learnt to read only recently, when she was nearing 4). And there is a structured process to recognising letters (or “sounds” as they call them) and to be able to draw them.

There are these sandpaper letters that the school has, and children are encouraged to “trace” them, using two fingers, so they know how the letters “flow”. And then this tracing helps first in identifying the sounds, and later writing them.

With school having been washed out pretty much all of this year, we have been starved of these resources. Instead, over a 2 hour Zoom call one Saturday in July, the teachers helped parents make “sound cards” by writing using a marker on handmade paper (another feature of Montessori is the introduction of cursive sounds at a young age. Children learn to write cursive before they learn to write print, if at all).

So when Berry has to learn how a particular sound is to be written, it is these cards that I have written that she has to turn to (she knows that different fonts exist in terms of reading, but that she should write in cursive when writing). She essentially traces the sounds that I have written with two fingers.

And then in the next step, I write the sounds on a slate (apparently it’s important to do this before graduating to pencil), and then she uses a different coloured chalk and traces over them. Once again she effectively traces my handwriting. Then earlier this week, during a “parent and child zoom class” organised by her school, she wanted to write a word and wasn’t able to write the full word in cursive and asked for my help. I held her hand and made her write it. My handwriting again!

Now that I realise why she seems to be getting influenced by my handwriting, I should maybe hand over full responsibility of teaching writing to the wife, whose handwriting is far superior to mine.

The trigger for this post was my opening of a notebook in which I had made notes during a meeting earlier this week (I usually use the notes app on the computer but had made an exception). Two things struck me before I started reading my notes – that my handwriting is similar to my father’s, and my handwriting is horrible (easily much worse than my father’s). And then I was reminded of earlier this week when I held my daughter’s hand and made her write.

This is how handwriting runs in the family.

Record of my publicly available work

A few people who I’ve spoken to as part of my job hunt have asked to see some “detailed descriptions” of work that I’ve done. The other day, I put together an email with some of these descriptions. I thought it might make sense to “document” it in one place (and for me, the “obvious one place” is this blog). So here it is. As you might notice, this takes the form of an email.


I’m putting together links to some of the publicly available work that i’ve done.
1. Cricket
I have a model to evaluate and “tell the story of a cricket match”. This works for all limited overs games, and is based on a dynamic programming algorithm similar to the WASP. The basic idea is to estimate the odds of each team winning at the end of each ball, and then chart that out to come up with a “match story”.
And through some simple rules-based intelligence, the key periods in the game are marked out.
The model can also be used to evaluate the contributions of individual batsmen and bowlers towards their teams’ cause, and when aggregated across games and seasons, can be used to evaluate players’ overall contributions.
Here is a video where I explain the model and how to interpret it:
The algorithm runs live during a game. You can evaluate the latest T20 game here:
Here is a more interactive version , including a larger selection of matches going back in time.
Related to this is a cricket analytics newsletter I actively wrote during the World Cup last year. Most Indians might find this post from the newsletter interesting:
2. Covid-19
At the beginning of the pandemic (when we had just gone under a national lockdown), I had built a few agent based models to evaluate the risk associated with different kinds of commercial activities. They are described here.
Every morning, a script that I have written parses the day’s data from covid19india.org and puts out some graphs to my twitter account  This is a daily fully automated feature.
Here is another agent based model that I had built to model the impact of social distancing on covid-19.
tweetstorm based on Bayes Theorem that I wrote during the pandemic went viral enough that I got invited to a prime time news show (I didn’t go).
3. Visualisations
I used to collect bad visualisations.
I also briefly wrote a newsletter analysing “good and bad visualisations”.
4. I have an “app” to predict which single malts you might like based on your existing likes. This blogpost explains the process behind (a predecessor of ) this model.
5. I had some fun with machine learning, using different techniques to see how they perform in terms of predicting different kinds of simple patterns.
6. I used to write a newsletter on “the art of data science”.
In addition to this, you can find my articles for Mint here. Also, this page on my website  as links to some anonymised case studies.

I guess that’s a lot? In any case, now I’m wondering if I did the right thing by choosing “skthewimp” as my Github username.

Core quants and desk quants on main street

The more perceptive of you might have realised that I’m in the job market.

Over the last one month, my search has mostly be “breadth first” (lots of exploratory conversations with lots of companies), and I’m only now starting to “go deep” into some of them. As part of this process, I need to send out a pitch to a company I’ve been in conversation with regarding what I can do for them.

So I’ve been thinking of how to craft my mandate while keeping in mind that they have an existing data science team. And while I was thinking about this problem, I realised that I can model it like how investment banks (at least one that I worked for) do – in terms of “core quants” and “desk quants”.

I have written about this on my blog before – most “data scientists” in industry are equivalent to what investment banks call “core quants”. They are usually highly technically accomplished people; in many cases they are people who were on an academic path that they left to turn to industry. They do very well in “researchy” environments.

They’re great at running long-gestation-period assignments, working on well defined technical problems and expressing their ideas in code. In general, though (I know I’m massively generalising), they are not particularly close to the business and struggle to deal with the ambiguities that business throws at them from time to time.

What I had mentioned in my earlier post is that “main street” (the American word for “general industry”) lacks “desk quants”. In investment banks, desk quants are attached to trading desks and work significantly closer to the business. They may work less on firmwide or long term strategic projects, but their strength is in blending the models and the markets, and building and making simple tweaks to models so that they remain relevant to the business.

And this is the sort of role in which I’m planning to pitch myself – to all potential employers. That while I’m rather comfortable technically, and all sorts of different modelling techniques, I’m not “deep into tech” and like to work close to the markets. I realise that this analogy will be lost on most people, so I need to figure out a better way of marketing myself. Any ideas will be appreciated.

Over the last month or so I’ve been fairly liberal and using my network to get introductions and references. The one thing I’ve struggled with there is how they describe me as. Most people end up describing me as a “data scientist”, and I’m not sure that’s an accurate description of what I do. Then again, it’s my responsibility to help them figure out how best to describe me. And that’s another thing I’m struggling in. “Desk quant” doesn’t translate well.

More on Covid-19 prevalence in Karnataka

As the old song went, “when the giver gives, he tears the roof and gives”.

Last week the Government of Karnataka released its report on the covid-19 serosurvey done in the state. You might recall that it had concluded that the number of cases had been undercounted by a factor of 40, but then some things were suspect in terms of the sampling and the weighting.

This week comes another sero-survey, this time a preprint of a paper that has been submitted to a peer reviewed journal. This survey was conducted by the IDFC Institute, a think tank, and involves academics from the University of Chicago and Duke University, and relies on the extensive sampling network of CMIE.

At the broad level, this survey confirms the results of the other survey – it concludes that “Overall seroprevalence in the state implies that by August at least 31.5 million residents had been infected by August”. This is much higher than the overall conclusions of the state-sponsored survey, which had concluded that “about 19 million residents had been infected by mid-September”.

I like seeing two independent assessments of the same quantity. While each may have its own sources of error, and may not independently offer much information, comparing them can offer some really valuable insights. So what do we have here?

The IDFC-Duke-Chicago survey took place between June and August, and concluded that 31.5 million residents of Karnataka (out of a total population of about 70 million) have been infected by covid-19. The state survey in September had suggested 19 million residents had been infected by September.

Clearly, since these surveys measure the number of people “who have ever been affected”, both of them cannot be correct. If 31 million people had been affected by end August, clearly many more than 19 million should have been infected by mid-September. And vice versa. So, as Ravi Shastri would put it, “something’s got to give”. What gives?

Remember that I had thought the state survey numbers might have been an overestimate thanks to inappropriate sampling (“low risk” not being low risk enough, and not weighting samples)? If 20 million by mid-September was an overestimate, what do you say about 31 million by end August? Surely an overestimate? And that is not all.

If you go through the IDFC-Duke-Chicago paper, there are a few figures and tables that don’t make sense at all. For starters, check out this graph, that for different regions in the state, shows the “median date of sampling” and the estimates on the proportion of the population that had antibodies for covid-19.

Check out the red line on the right. The sampling for the urban areas for the Bangalore region was completed by 24th June. And the survey found that more than 50% of respondents in this region had covid-19 antibodies. On 24th June.

Let’s put that in context. As of 24th June, Bangalore Urban had 1700 confirmed cases. The city’s population is north of 10 million. I understand that 24th June was the “median date” of the survey in Bangalore city. Even if the survey took two weeks after that, as of 8th of July, Bangalore Urban had 12500 confirmed cases.

The state survey had estimated that known cases were 1 in 40. 12500 confirmed cases suggests about 500,000 actual cases. That’s 5% of Bangalore’s population, not 50% as the survey claimed. Something is really really off. Even if we use the IDFC-Duke-Chicago paper’s estimates that only 1 in 100 cases were reported / known, then 12500 known cases by 8th July translates to 1.25 million actual cases, or 12.5% of the city’s population (well below 50% ).

My biggest discomfort with the IDFC-Duke-Chicago effort is that it attempts to sample a rather rapidly changing variable over a long period of time. The survey went on from June 15th to August 29th. By June 15th, Karnataka had 7200 known cases (and 87 deaths). By August 29th the state had 327,000 known cases and 5500 deaths. I really don’t understand how the academics who ran the study could reconcile their data from the third week of June to the data from the third week of August, when the nature of the pandemic in the state was very very different.

And now, having looked at this paper, I’m more confident of the state survey’s estimations. Yes, it might have sampling issues, but compared to the IDFC-Duke-Chicago paper, the numbers make so much more sense. So yeah, maybe the factor of underestimation of Covid-19 cases in Karnataka is 40.

Putting all this together, I don’t understand one thing. What these surveys have shown is that

  1. More than half of Bangalore has already been infected by covid-19
  2. The true infection fatality rate is somewhere around 0.05% (or lower).

So why do we still have a (partial) lockdown?

PS: The other day on WhatsApp I saw this video of an extremely congested Chickpet area on the last weekend before Diwali. My initial reaction was “these people have lost their minds. Why are they all in such a crowded place?”. Now, after thinking about the surveys, my reaction is “most of these people have most definitely already got covid and recovered. So it’s not THAT crazy”.

Communicating binary forecasts

One silver lining in the madness of the US Presidential election counting is that there are some interesting analyses floating around regarding polling and surveying and probabilities and visualisation. Take this post from Andrew Gelman’s blog, for example:

Suppose our forecast in a certain state is that candidate X will win 0.52 of the two-party vote, with a forecast standard deviation of 0.02. Suppose also that the forecast has a normal distribution.[…]

Then your 68% predictive interval for the candidate’s vote share is [0.50, 0.54], and your 95% interval is [0.48, 0.56].

Now suppose the candidate gets exactly half of the vote. Or you could say 0.499, the point being that he lost the election in that state.

This outcome falls on the boundary of the 68% interval, it’s one standard deviation away from the forecast. In no sense would this be called a prediction error or a forecast failure.

But now let’s say it another way. The forecast gave the candidate an 84% chance of winning! And then he lost. That’s pretty damn humiliating. The forecast failed.

It took me a while to appreciate this. In a binary outcome, if your model says predicts 52%, with a standard deviation of 2%, you are in effect predicting a “win” (50% or higher) with a probability of 84%! Somehow I had never thought about it that way.

In any case, this tells you how tricky forecasting a binary outcome is. You might think (based on your sample size) that a 2% standard deviation is reasonable. Except that when the mean of your forecast is close to the barrier (50% in this case), the “reasonable standard deviation” lends a much stronger meaning to your forecast.

Gelman goes on:

That’s right. A forecast of 0.52 +/- 0.02 gives you an 84% chance of winning.

We want to increase the sd in the above expression so as to send the win probability down to 60%. How much do we need to increase it? Maybe send it from 0.02 to 0.03?

> pnorm(0.52, 0.50, 0.03)
[1] 0.75

Uh, no, that wasn’t enough! 0.04?

> pnorm(0.52, 0.50, 0.04)
[1] 0.69

0.05 won’t do it either. We actually have to go all the way up to . . . 0.08:

> pnorm(0.52, 0.50, 0.08)
[1] 0.60

That’s right. If your best guess is that candidate X will receive 0.52 of the vote, and you want your forecast to give him a 60% chance of winning the election, you’ll have to ramp up the sd to 0.08, so that your 95% forecast interval is a ridiculously wide 0.52 +/- 2*0.08, or [0.36, 0.68].

Who said forecasting an election is easy?