## Confusing with complications

I’m reading this awesome article by Srinivas Bhogle (with Rajeeva Karandikar) on election forecasting. To be fair, not much of the article is new to me – it’s just a far more readable version of Karandikar’s seminal presentation on the topic made at IIT Kanpur all those years back.

However, as with all good retellings, this story also has some nice tidbits. This one has to do with “index of opposition unity”. The voice here is Bhogle’s:

It is easy to understand why the IOU becomes so critical in such situations. But, and here’s the rub, the exact mathematical formula connecting IOU to the seat count prediction is not easy to find. I searched through the big and small print of The Verdict by Dorab Sopariwala and Prannoy Roy, but the formula remained elusive.

Rajeeva suggests that it was likely based on simple heuristics: something like ‘if the IOU is less than 25%, give the first-placed party 75% of the seats.’ It may also have involved intelligent tweaking based on current survey data, historical data, informal feedback, expert opinion, gut feeling, and so on.

I first came across the IOU in Prannoy Roy and Dorab Sopariwala’s book. The way they had presented in the book, it seemed like it is a “major concept”. It seems, like I did, Bhogle also looked through the book trying to find a precise formula, and failed to do so.

And then Karandikar’s insight above is crucial – that the IOU may not be a precise mathematical formula, but just an intelligent set of heuristics, involving intelligent tweaking.

Sometimes putting a fancy name (or, even better, an acronym) on something can help lend credibility to the concept. For example, IOU is something that has been championed by Roy and Sopariwala for years, and they have done so to a level where it has become a self-fulfilling prophecy, and a respected scientist for Bhogle has gone searching for its formula!

Also, sometimes, telling people that you “used an intelligent heuristic” to come up with a conclusion can lead you to be taken less seriously. Put on a fancy name (even if it is something that you have yourself come up with), and the game changes. You suddenly start to be taken more seriously, like Ganesha assumed when he started sending fan mail under the name “YG Rao”.

And like they say in The Usual Suspects, sometimes the greatest trick that the devil ever pulled was to convince you that he exists. It is the same with “concepts” such as IOU – you THINK they must be sound because they come with a fancy name, when all that they apeear to represent is a set of fancy heuristics.

I must say this is excellent marketing.

## ISAs and Power Laws

There are a number of professions where incomes are distributed according to a power law. The most successful people in the professions corner a very large share of the income that people in the profession make, and unless you reach that very high level of success, you might even struggle to make a living wage.

Professions of this nature include the arts (movies, music, drama, standup comedy, painting, sculpture, etc.), sports, writing and entrepreneurship. The thing with such professions is that it needs some degree of “socialism” – if people are left to their own devices, then the 99% confidence payoffs will mean that few people will enter the profession, and when fewer people enter the profession, the overall quality of the profession goes down.

So what is required in this case is some sort of a safety net – people who are reasonably competent at the profession get paid a sort of regular basic income (could either be one-time, periodic or output-based) by “investors” in exchange for a cut of the upside. And this, for a talented but struggling beginner, is usually a good deal – they are assured a basic income to pursue what they love and think they are good at, and anything they have to pay in return is only probabilistic – contingent upon a heavy degree of success.

And in order for this kind of safety net to work, it is important that the investment be of the nature of “equity” rather than “debt” – the extreme power law nature of these professions is that only a small proportion of the people who get the safety net will be able to pay back, and those that are able to pay back will be able to pay disproportionately large amounts.

Entrepreneurship and film acting have sort of done well in terms of providing these safety net. Entrepreneurs get venture capital investment, which allows them to fund their business and take (nominal) salaries, while working on the thing they hope to make it big in. The venture capitalists make money even when a small proportion of their investments don’t fail.

The model in acting is a little different- studios hire actors on long term contracts at negotiated salaries. These salaries give actors the safety net to continue in the profession. And in case the actors become popular, the studios cash out essentially by “encashing the option” of using the actor at the pre-negotiated rate for the duration of the contract.

There are other examples of these safety nets as well – artist studios pay their artists a basic wage, in exchange for a cut on the sale of their paintings. However, the model is not as popular as it seems.

For sportspersons, for example, apart from things like the Ranji Trophy increasing match fees in a big way in the late noughties, this kind of a safety net has been absent. The studio model in acting hasn’t held on. Writers get advances but that doesn’t represent much of a “living wage”.

The good news is that this is changing. Investment in athletes in exchange for a cut of future earnings is gaining traction. And now we have this deal ($): Taxes will cut into his new 14-year agreement with the Padres, of course. But Tatis also must pay off a previous obligation, a deal he made during the 2017-18 offseason, when he was turning 19 years old and preparing for his first full season at Double A. It was then that Tatis entered into a contract with Big League Advance (BLA), a company that offers select minor leaguers upfront payments in exchange for a percentage of their future earnings in Major League Baseball. Neither Tatis nor BLA has revealed the exact percentage he owes the company. The company’s president and CEO, former major-league pitcher Michael Schwimer, told The Athletic in April 2018 that BLA uses a proprietary algorithm to value every player in the minors. Players who receive offers can accept a base-level payout in return for 1 percent of their earnings, with the chance to receive greater incremental payouts and pay back a maximum of 10 percent. If a player never reaches the majors, he keeps the cash advance, with no obligation to pay it back. This is an awesome thing. For a struggling potential sportsperson, a minor investment (in exchange for equity) can provide a huge boost in their chances of making it – hiring coaches, for example, or eating better food, or living more comfortably. While the media attention will go to the small proportion of investments that do pay off (like how tech media gives disproportionate coverage, and quite rightly so, to startups that do well), arrangements like this mean that more people will play the sport, and the overall standard in the sport will improve. We need to see if such arrangements start making a mark in the rest of the arts and writing as well. Oh, and much has been made of income sharing agreements for professional colleges and “tuition centres”. I’m not sure that is the right model there – the thing is that if you are studying to be a software engineer, your payoffs don’t follow a power law. Yes, if you are successful, you make a few orders of magnitude more money than the less successful ones, but even an average software engineer can expect to make a fairly decent income. From that perspective, selling equity in your future earnings to get paid to study engineering is not a great idea, and can lead to adverse selection on the part of the candidates (the better ones will prefer to get funding through debt, which their average salaries can help pay off). In that sense I prefer what the likes of MountBlue are doing, where the “training fees” get paid off by simply working for the company for a certain period of time. ## Monetising volatility I’m catching up on old newsletters now – a combination of job and taking my email off what is now my daughter’s iPad means I have a considerable backlog – and I found this gem in Matt Levine’s newsletter from two weeks back ($; Bloomberg).

“it comes from monetizing volatility, that great yet under-appreciated resource.”

He is talking about equity derivatives, and says that this is “not such a good explanation”. While it may not be such a good explanation when it comes to equity derivatives itself, I think it has tremendous potential outside of finance.

I’m reminded of the first time I was working in the logistics industry (back in 2007). I had what I had thought was a stellar idea, which was basically based on monetising volatility, but given that I was in a company full of logistics and technology and operations research people, and no other derivatives people, I had a hard time convincing anyone of that idea.

My way of “monetising volatility” was rather simple – charge people cancellation fees. In the part of the logistics industry I was working in back then, this was (surprisingly, to me) a particularly novel idea. So how does cancellation fees equate to monetising volatility?

Again it’s due to “unbundling”. Let’s say you purchase a train ticket using advance reservation. You are basically buying two things – the OPTION to travel on that particular day using that particular train, sitting on that particular seat, and the cost of the travel itself.

The genius of the airline industry following the deregulation in the US in the 1980s was that these two costs could be separated. The genius was that charging separately for the travel itself and the option to travel, you can offer the travel itself at a much lower price. Think of the cancellation charge as as the “option premium” for exercising the option to travel.

And you can come up with options with different strike prices, and depending upon the strike price, the value of the option itself changes. Since it is the option to travel, it is like a call option, and so higher the strike price (the price you pay for the travel itself), the lower the price of the option.

This way, you can come up with a repertoire of strike-option combinations – the more you’re willing to pay for cancellation (option premium), the lower the price of the travel itself will be. This is why, for example, the cheapest airline tickets are those that come with close to zero refund on cancellation (though I’ve argued that bringing refunds all the way to zero is not a good idea).

Since there is uncertainty in whether you can travel at all (there are zillions of reasons why you might want to “cancel tickets”), this is basically about monetising this uncertainty or (in finance terms) “monetising volatility”. Rather than the old (regulated) world where cancellation fees were low and travel charges were high (option itself was not monetised), monetising the options (which is basically a price on volatility) meant that airlines could make more money, AND customers could travel cheaper.

It’s like money was being created out of thin air. And that was because we monetised volatility.

I had the same idea for another part of the business, but unfortunately we couldn’t monetise that. My idea was simple – if you charge cancellation fees, our demand will become more predictable (since people won’t chumma book), and this means we will be able to offer a discount. And offering a discount would mean more people would buy this more predictable demand, and in the immortal jargon of Silicon Valley, “a flywheel would be set in motion”.

The idea didn’t fly. Maybe I was too junior. Maybe people were suspicious of my brief background in banking. Maybe most people around me had “too much domain knowledge”. So the idea of charging for cancellation in an industry that traditionally didn’t charge for cancellation didn’t fly at all.

Anyway all of that is history.

Now that I’m back in the industry, it remains to be seen if I can come up with such “brilliant” ideas again.

## Uncertainty and Anxiety

A lot of parenting books talk about the value of consistency in parenting – when you are consistent with your approach with something, the theory goes, the child knows what to expect, and so is less anxious about what will happen.

It is not just about children – when something is more deterministic, you can “take it for granted” more. And that means less anxiety about it.

From another realm, prices of options always have “positive vega” – the higher the market volatility, the more the price of the option. Thinking about it another way, the more the uncertainty, the more people are willing to pay to hedge against it. In other words, higher uncertainty means more anxiety.

However, sometimes the equation can get flipped. Let us take the case of water supply in my apartment. We have both a tap water connection and a borewell, so historically, water supply has been fairly consistent. For the longest time, we didn’t bother thinking about the pressure of water in the taps.

And then one day in the beginning of this year the water suddenly stopped. We had an inkling of it that morning as the water in the taps inexplicably slowed down, and so stored a couple of buckets until it ground to a complete halt later that day.

It turned out that our water pump, which is way deep inside the earth (near the water table) was broken, so it took a day to fix.

Following that, we have become more cognisant of the water pressure in the pipes. If the water pressure goes down for a bit, the memory of the day when the motor conked is fresh, and we start worrying that the water will suddenly stop. I’ve panicked at least a couple of times wondering if the water will stop.

However, after this happened a few times over the last few months I’m more comfortable. I now know that fluctuation of water pressure in the tap is variable. When I’m showering at the same time as my downstairs neighbour (I’m guessing), the water pressure will be lower. Sometimes the level of water in the tank is just above the level required for the pump to switch on. Then again the pressure is lower. And so forth.

In other words, observing a moderate level of uncertainty has actually made me more comfortable now and reduced my anxiety – within some limits, I know that some fluctuation is “normal”.  This uncertainty is more than what I observed earlier, so in other words, increased (perceived) uncertainty has actually reduced anxiety.

One way I think of it is in terms of hidden risks – when you see moderate fluctuations, you know that fluctuations exist and that you don’t need to get stressed around them. So your anxiety is lower. However, if you’ve gone a very long time with no fluctuation at all, then you are concerned that there are hidden risks that you have not experienced yet.

So when the water pressure in the taps has been completely consistent, then any deviation is a very strong (Bayesian) sign that something is wrong. And that increases anxiety.

## Shooting, investing and the hot hand

A couple of years back I got introduced to “Stumbling and Mumbling“, a blog written by Chris Dillow, who was described to me as a “Marxist investment banker”. I don’t agree with a lot of the stuff in his blog, but it is all very thoughtful.

He appears to be an Arsenal fan, and in his latest post, he talks about “what we can learn from football“. In that, he writes:

These might seem harmless mistakes when confined to talking about football. But they have analogues in expensive mistakes. The hot-hand fallacy leads investors to pile into unit trusts with good recent performance (pdf) – which costs them money as the performance proves unsustainable. Over-reaction leads them to buy stocks at the top of the market and sell at the bottom. Failing to see that low probabilities compound to give us a high one helps explain why so many projects run over time and budget. And so on.

Now, the hot hand fallacy has been a hard problem in statistics for a few years now. Essentially, the intuitive belief in basketball is that someone who has scored a few baskets is more likely to be successful in his next basket (basically, the player is on a “hot hand”).

It all started with a seminal paper by Amos Tversky et al in the 1980s, that used (the then limited) data to show that the hot hand is a fallacy. Then, more recently, Miller and Sanjurjo took another look at the problem and, with far better data at hand, found that the hot hand is actually NOT a fallacy.

There is a nice podcast on The Art of Manliness, where Ben Cohen, who has written a book about hot hands, spoke about the research around it. In any case, there are very valid reasons as to why hot hands exist.

Yet, Dillow is right – while hot hands might exist in something like basketball shooting, it doesn’t in something like investing. This has to do with how much “control” the person in question has. Let me switch fields completely now and quote a paragraph from Venkatesh Guru Rao‘s “The Art Of Gig” newsletter:

As an example, take conducting a workshop versus executing a trade based on some information. A significant part of the returns from a workshop depend on the workshop itself being good or bad. For a trade on the other hand, the returns are good or bad depending on how the world actually behaves. You might have set up a technically perfect trade, but lose because the world does something else. Or you might have set up a sloppy trade, but the world does something that makes it a winning move anyway.

This is from the latest edition, which is paid. Don’t worry if you aren’t a subscriber. The above paragraph I’ve quoted is sufficient for the purpose of this blogpost.

If you are in the business of offering workshops, or shooting baskets, the outcome of the next workshop or basket depends largely upon your own skill. There is randomness, yes, but this randomness is not very large, and the impact of your own effort on the result is large.

In case of investing, however, the effect of the randomness is very large. As VGR writes, “For a trade on the other hand, the returns are good or bad depending on how the world actually behaves”.

So if you are in a hot hand when it comes to investing, it means that “the world behaved in a way that was consistent with your trade” several times in a row. And that the world has behaved according to your trade several times in a row makes it no more likely that the world will behave according to your trade next time.

If, on the other hand, you are on a hot hand in shooting baskets or delivering lectures, then it is likely that this hot hand is because you are performing well. And because you are performing well, the likelihood of you performing well on the next turn is also higher. And so the hot hand theory holds.

So yes, hot hands work, but only in the context “with a high R Square”, where the impact of the doer’s performance is large compared to the outcome. In high randomness regimes, such as gambling or trading, the hot hand doesn’t matter.

## What is the Case Fatality Rate of Covid-19 in India?

The economist in me will give a very simple answer to that question – it depends. It depends on how long you think people will take from onset of the disease to die.

The modeller in me extended the argument that the economist in me made, and built a rather complicated model. This involved smoothing, assumptions on probability distributions, long mathematical derivations and (for good measure) regressions.. And out of all that came this graph, with the assumption that the average person who dies of covid-19 dies 20 days after the thing is detected.

Yes, there is a wide variation across the country. Given that the disease is the same and the treatment for most people diseased is pretty much the same (lots of rest, lots of water, etc), it is weird that the case fatality rate varies by so much across Indian states. There is only one explanation – assuming that deaths can’t be faked or miscounted (covid deaths attributed to other reasons or vice versa), the problem is in the “denominator” – the number of confirmed cases.

What the variation here tells us is that in states towards the top of this graph, we are likely not detecting most of the positive cases (serious cases will get themselves tested anyway, and get hospitalised, and perhaps die. It’s the less serious cases that can “slip”). Taking a state low down below in this graph as a “good tester” (say Andhra Pradesh), we can try and estimate what the extent of under-detection of cases in each state is.

Based on state-wise case tallies as of now (might be some error since some states might have reported today’s number and some mgiht not have), here are my predictions on how many actual number of confirmed cases there are per state, based on our calculations of case fatality rate.

Yeah, Maharashtra alone should have crossed a million caess based on the number of people who have died there!

Now let’s get to the maths. It’s messy. First we look at the number of confirmed cases per day and number of deaths per day per state (data from here). Then we smooth the data and take 7-day trailing moving averages. This is to get rid of any reporting pile-ups.

Now comes the probability assumption – we assume that a proportion $p$ of all the confirmed cases will die. We assume an average number of days ($N$) to death for people who are supposed to die (let’s call them Romeos?). They all won’t pop off exactly $N$ days after we detect their infection. Let’s say a proportion $\lambda$ dies each day. Of everyone who is infected, supposed to die and not yet dead, a proportion $\lambda$ will die each day.

My maths has become rather rusty over the years but a derivation I made shows that $\lambda = \frac{1}{N}$. So if people are supposed to die in an average of 20 days, $\frac{1}{20}$ will die today, $\frac{19}{20}\frac{1}{20}$ will die tomorrow. And so on.

So people who die today could be people who were detected with the infection yesterday, or the day before, or the day before day before (isn’t it weird that English doesn’t a word for this?) or … Now, based on how many cases were detected on each day, and our assumption of $p$ (let’s assume a value first. We can derive it back later), we can know how many people who were found sick $k$ days back are going to die today. Do this for all $k$, and you can model how many people will die today.

The equation will look something like this. Assume $d_t$ is the number of people who die on day $t$ and $n_t$ is the number of cases confirmed on day $t$. We get

$d_t = p (\lambda n_{t-1} + (1-\lambda) \lambda n_{t-2} + (1-\lambda)^2 \lambda n_{t-3} + ... )$

Now, all these $n$s are known. $d_t$ is known. $\lambda$ comes from our assumption of how long people will, on average, take to die once their infection has been detected. So in the above equation, everything except $p$ is known.

And we have this data for multiple days. We know the left hand side. We know the value in brackets on the right hand side. All we need to do is to find $p$, which I did using a simple regression.

And I did this for each state – take the number of confirmed cases on each day, the number of deaths on each day and your assumption on average number of days after detection that a person dies. And you can calculate $p$, which is the case fatality rate. The true proportion of cases that are resulting in deaths.

This produced the first graph that I’ve presented above, for the assumption that a person, should he die, dies on an average 20 days after the infection is detected.

So what is India’s case fatality rate? While the first graph says it’s 5.8%, the variations by state suggest that it’s a mild case detection issue, so the true case fatality rate is likely far lower. From doing my daily updates on Twitter, I’ve come to trust Andhra Pradesh as a state that is testing well, so if we assume they’ve found all their active cases, we use that as a base and arrive at the second graph in terms of the true number of cases in each state.

PS: It’s common to just divide the number of deaths so far by number of cases so far, but that is an inaccurate measure, since it doesn’t take into account the vintage of cases. Dividing deaths by number of cases as of a fixed point of time in the past is also inaccurate since it doesn’t take into account randomness (on when a Romeo might die).

Anyway, here is my code, for what it’s worth.

deathRate <- function(covid, avgDays) {
covid %>%
mutate(Date=as.Date(Date, '%d-%b-%y')) %>%
gather(State, Number, -Date, -Status) %>%
arrange(State, Date) ->
cov1

# Need to smooth everything by 7 days
cov1 %>%
arrange(State, Date) %>%
group_by(State) %>%
mutate(
TotalConfirmed=cumsum(Confirmed),
TotalDeceased=cumsum(Deceased),
ConfirmedMA=(TotalConfirmed-lag(TotalConfirmed, 7))/7,
DeceasedMA=(TotalDeceased-lag(TotalDeceased, 7))/ 7
) %>%
ungroup() %>%
filter(!is.na(ConfirmedMA)) %>%
select(State, Date, Deceased=DeceasedMA, Confirmed=ConfirmedMA) ->
cov2

cov2 %>%
select(DeathDate=Date, State, Deceased) %>%
inner_join(
cov2 %>%
select(ConfirmDate=Date, State, Confirmed) %>%
crossing(Delay=1:100) %>%
mutate(DeathDate=ConfirmDate+Delay),
by = c("DeathDate", "State")
) %>%
filter(DeathDate > ConfirmDate) %>%
arrange(State, desc(DeathDate), desc(ConfirmDate)) %>%
mutate(
Lambda=1/avgDays,
) %>%
filter(Deceased > 0) %>%
group_by(State, DeathDate, Deceased) %>%
ungroup() %>%
summary() %>%
broom::tidy() %>%
select(estimate) %>%
first() %>%
return()
}

## Games of luck and skill

My good friend Anuroop has two hobbies – poker and wildlife photography. And when we invited him to NED Talks some 5 years ago, he decided to combine these two topics into the talk, by speaking about “why wildlife photography is like poker” (or the other way round, I’ve forgotten).

I neither do wildlife photography nor play poker so I hadn’t been able to appreciate his talk in full when he delivered it. However, our trip to Jungle Lodges River Tern Resort (at Bhadra Wildlife Sanctuary) earlier this year demonstrated to me why poker and wildlife photography are similar – they are both “games of luck AND skill”.

One debate that keeps coming up in Indian legal circles is whether a particular card game (poker, rummy, etc.) is a “game of luck” or a “game of skill”. While this might sound esoteric, it is a rather important matter – games of skill don’t need any permission from any authority, while games of luck are banned to different extents by different states (they are seen as being similar to “gambling”, and the moralistic Indian states don’t want to permit that).

Many times in the recent past, courts in India have declared poker and rummy to be “games of skill“, which means “authorities” cannot disrupt any such games. Still, for different reasons, they remain effectively illegal in certain states.

In any case, what makes games like poker interesting is that they combine skill and luck. This is also what makes games like this addictive. That there is skill involved means that you get constantly better over time, and the more you play, the greater the likelihood that you will win (ok it doesn’t increase at the same rate for everyone, and there is occasional regression as well).

If it were a pure game of skill, then things would get boring, since in a game of skill the better player wins every single time. So unless you get a “sparring partner” of approximately your own level, nobody will want to play with you (this is one difficulty with games like chess).

With luck involved, however, the odds change. It is possible to beat someone much better (on average) than you, or lose to someone much worse (on average). In other words, if you are designing an Elo rating system for a game like poker, you need to change players’ ratings by very little after each game (compared to a game of pure skill such as chess).

Because there is luck involved, there is “greater information content” in the result of each game (remember from information theory that a perfectly fair coin has the most information content (1 bit) among all coins). And this makes the game more fun to play. And the better player is seen as better only when lots of games are played. And so people want to play more.

It is the same with wildlife photography. It is a game of skill because as you do more and more of it, you know where to look for the tigers and leopards (and ospreys and wild dogs). You know where and how long you should wait to maximise your chances of a “sighting”. The more you do it, the better you become at photography as well.

And it is a game of luck because despite your best laid plans, there is a huge amount of luck involved. Just on the day you set up, the tiger might decide to take another path to the river. The osprey might decide on a siesta that is a little bit longer than usual.

At the entrance of JLR River Tern Lodge, there is a board that shows what animals were “sighted” during each safari in the preceding one week. Each day, the resort organises two safaris, one each in the morning and afternoon, and some of them are by boat and some by jeep.

I remember trying to study the boards and try and divine patterns to decide when we should go by boat and when by jeep (on the second day of our stay there, we were the “longest staying guests” and thus given the choice of safari). One the first evening, in our jeep safari, we saw a herd of elephants. And a herd of gaur. And lots of birds. And a dead deer.

That we had “missed out” on tigers and leopards meant that we wanted to do it again. If what we saw depended solely on the skill of the naturalist and the driver who accompanied us, we would not have been excited to go into the forest again.

However, the element of luck meant that we wanted to just keep going, and going.

Games of pure luck or pure skill can get boring after a while. However, when both luck and skill get involved, they can really really get addictive. Now I fully appreciate Anuroop’s NED Talk.

## I don’t know which 80%

Legendary retailer John Wanamaker (who pioneered fixed price stores in the mid 1800s) is supposed to have said that “half of all advertising is useless. The trouble is I don’t know which half”.

I was playing around with my twitter archive data, and was looking at the distribution of retweets and favourites across all my tweets. To say that it follows a power law is an understatement.

Before this blog post triggers an automated tweet, I have 63793 tweets, of which 59,275 (93%) have not had a single retweet. 51,717 (81%) have not had a single person liking them. And 50, 165 (79%) of all my tweets have not had a single retweet or a favourite.

In other words, nearly 80% of all my tweets had absolutely no impact on the world. They might as well have not existed. Which means that I should cut down my time spent tweeting down to a fifth. Just that, to paraphrase Wanamaker, I don’t know which four fifths I should eliminate!

There is some good news, though. Over time, the proportion of my tweets that has no impact (in terms of retweets or favourites – the twitter dump doesn’t give me the number of replies to a tweet) has been falling consistently.

Right now, this month, the score is around 33% or so. So even though the proportion of my useless tweets have been dropping over time, even now one in every tweets that I tweet has zero impact.

My “most impactful tweet” itself account for 17% of all retweets that I’ve got. Here I look at what proportion of tweets have accounted for what proportion of “reactions” (reactions for each tweet is defined as the sum of number of retweets and number of favourites. I understand that the same person might have been retweeted and favourited something, but I ignore that bit now).

Notice how extreme the graph is. 0.7% of all my tweets have accounted for 50% of all retweets and likes! 10% of all my tweets have accounted for 90% of all retweets and likes.

Even if I look only at recent data, it doesn’t change shape that much – starting from January 2019, 0.8% of my tweets have accounted for 50% of all retweets and likes.

This, I guess, is the fundamental nature of social media. The impact of a particular tweet follows a power law with a very small exponent (meaning highly unequal).

What this also means is that anyone can go viral. Anyone from go from zero to hero in a single day. It is very hard to predict who is going to be a social media sensation some day.

So it’s okay that 80% of my tweets have no traction. I got one blockbuster, and who knows – I might have another some day. I guess such blockbusters is what we live for.

## The World After Overbooking

Why do you think you usually have to wait so much to see a doctor, even when you have an appointment? It is because doctors routinely overbook.

You can think of a doctor’s appointment as being a free option. You call up, give your patient number, and are assigned a slot when the doctor sees you. If you choose to see the doctor at that time, you get the doctor’s services, and then pay for the service. If you choose to not turn up, the doctor’s time in that slot is essentially wasted, since there is nobody else to see then. The doctor doesn’t get compensated for this as well.

In order to not waste their time, thus, doctors routinely overbook patients. If the average patient takes fifteen minutes to see, they give appointments once every ten minutes, in the hope of building up a buffer so that their time is not wasted. This way they protect their incomes, and customers pay for this in terms of long waiting hours.

Now, in the aftermath of the covid crisis, this will need to change. People won’t want to spend long hours in a closed waiting room with scores of other sick people. In an ideal world, doctors will want to not let two of their patients even see each other, since that could mean increased disease transmission.

In the inimitable words of Ravishastri, “something’s got to give”.

One way could be for doctors to simply up their fees and give out appointments at intervals that better reflect the time taken per patient. The problem with this is that there are reputation costs to upping fee per patient, and doctors simply aren’t conditioned to unexpected breaks between patients. Moreover, lower number of slots might mean appointments not being available for several days together, and higher cancellations as well, both problems that doctors want to avoid.

As someone with a background in financial derivatives, there is one obvious thing to tackle – the free option being given to patients in terms of the appointment. What if you were to charge people for making appointments?

Now, taking credit card details at the time of booking is not efficient. However, assuming that most patients a doctor sees are “repeat patients”, just keeping track of who didn’t turn up for appointments can be used to charge them extra on the next visit (this needs to have been made clear in advance, at the time of making the appointment).

My take is that even if this appointment booking cost is trivial (say 5% of the session fee), people are bound to take the appointments more seriously. And when people take their appointments more seriously, the amount of buffer built in by doctors in their schedules can be reduced. Which means they can give out appointments at more realistic intervals. Which also means their income overall is protected, while still maintaining social distancing among patients.

I remember modelling this way back when I was working in air cargo pricing. There again, free options abound. I remember building this model that showed that charging a nominal fee for the options could result in a much lower fee for charging the actual cargo. A sort of win-win for customers and airlines alike. Needless to say, I was the only ex-derivatives guy around and it proved to be a really hard sell everywhere.

However, the concept remains. When options that have hitherto been free get monetised, it will lead to a win-win situation and significantly superior experience for all parties involved. The only caveat is that the option pricing should be implemented in a manner with as little friction as possible, else transaction costs can overwhelm the efficiency gains.

## More on covid testing

There has been a massive jump in the number of covid-19 positive cases in Karnataka over the last couple of days. Today, there were 44 new cases discovered, and yesterday there were 36. This is a big jump from the average of about 15 cases per day in the preceding 4-5 days.

The good news is that not all of this is new infection. A lot of cases that have come out today are clusters of people who have collectively tested positive. However, there is one bit from yesterday’s cases (again a bunch of clusters) that stands out.

I guess by now everyone knows what “travelled from Delhi” is a euphemism for. The reason they are interesting to me is that they are based on a “repeat test”. In other words, all these people had tested negative the first time they were tested, and then they were tested again yesterday and found positive.

Why did they need a repeat test? That’s because the sensitivity of the Covid-19 test is rather low. Out of every 100 infected people who take the test, only about 70 are found positive (on average) by the test. That also depends upon when the sample is taken.  From the abstract of this paper:

Over the four days of infection prior to the typical time of symptom onset (day 5) the probability of a false negative test in an infected individual falls from 100% on day one (95% CI 69-100%) to 61% on day four (95% CI 18-98%), though there is considerable uncertainty in these numbers. On the day of symptom onset, the median false negative rate was 39% (95% CI 16-77%). This decreased to 26% (95% CI 18-34%) on day 8 (3 days after symptom onset), then began to rise again, from 27% (95% CI 20-34%) on day 9 to 61% (95% CI 54-67%) on day 21.

About one in three (depending upon when you draw the sample) infected people who have the disease are found by the test to be uninfected. Maybe I should state it again. If you test a covid-19 positive person for covid-19, there is almost a one-third chance that she will be found negative.

The good news (at the face of it) is that the test has “high specificity” of about 97-98% (this is from conversations I’ve had with people in the know. I’m unable to find links to corroborate this), or a false positive rate of 2-3%. That seems rather accurate, except that when the “prior probability” of having the disease is low, even this specificity is not good enough.

Let’s assume that a million Indians are covid-19 positive (the official numbers as of today are a little more than one-hundredth of that number). With one and a third billion people, that represents 0.075% of the population.

Let’s say we were to start “random testing” (as a number of commentators are advocating), and were to pull a random person off the street to test for Covid-19. The “prior” (before testing) likelihood she has Covid-19 is 0.075% (assume we don’t know anything more about her to change this assumption).

If we were to take 20000 such people, 15 of them will have the disease. The other 19985 don’t. Let’s test all 20000 of them.

Of the 15 who have the disease, the test returns “positive” for 10.5 (70% accuracy, round up to 11). Of the 19985 who don’t have the disease, the test returns “positive” for 400 of them (let’s assume a specificity of 98% (or a false positive rate of 2%), placing more faith in the test)! In other words, if there were a million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 11/411 = 2.6%.

If there were 10 million covid-19 positive people in India (no harm in supposing), then the “base rate” would be .75%. So out of our sample of 20000, 150 would have the disease. Again testing all 20000, 105 of the 150 who have the disease would test positive. 397 of the 19850 who don’t have the disease will test positive. In other words, if there were ten million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 105/(397+105) = 21%.

###### If there were ten million Covid-19 positive people in India, only one-fifth of the people who tested positive in a random test would actually have the disease.

Take a sip of water (ok I’m reading The Ken’s Beyond The First Order too much nowadays, it seems).

This is all standard maths stuff, and any self-respecting book or course on probability and Bayes’s Theorem will have at least a reference to AIDS or cancer testing. The story goes that this was a big deal in the 1990s when some people suggested that the AIDS test be used widely. Then, once this problem of false positives and posterior probabilities was pointed out, the strategy of only testing “high risk cases” got accepted.

And with a “low incidence” disease like covid-19, effective testing means you test people with a high prior probability. In India, that has meant testing people who travelled abroad, people who have come in contact with other known infected, healthcare workers, people who attended the Tablighi Jamaat conference in Delhi, and so on.

The advantage with testing people who already have a reasonable chance of having the disease is that once the test returns positive, you can be pretty sure they actually have the disease. It is more effective and efficient. Testing people with a “high prior probability of disease” is not discriminatory, or a “sampling bias” as some commentators alleged. It is prudent statistical practice.

Again, as I found to my own detriment with my tweetstorm on this topic the other day, people are bound to see politics and ascribe political motives to everything nowadays. In that sense, a lot of the commentary is not surprising. It’s also not surprising that when “one wing” heavily retweeted my article, “the other wing” made efforts to find holes in my argument (which, again, is textbook math).

One possibly apolitical criticism of my tweetstorm was that “the purpose of random testing is not to find out who is positive. It is to find out what proportion of the population has the disease”. The cost of this (apart from the monetary cost of actually testing) are threefold. Firstly, a large number of uninfected people will get hospitalised in covid-specific hospitals, clogging hospital capacity and increasing the chances that they get infected while in hospital.

Secondly, getting a truly random sample in this case is tricky, and possibly unethical. When you have limited testing capacity, you would be inclined (possibly morally, even) to use it on people who already have a high prior probability.

Finally, when the incidence is small, we need a really large sample to find out the true range.

Let’s say 1 in 1000 Indians have the disease (or about 1.35 million people). Using the Chi Square test of proportions, our estimate of the incidence of the disease varies significantly on how many people are tested.

If we test a 1000 people and find 1 positive, the true incidence of the disease (95% confidence interval) could be anywhere from 0.01% to 0.65%.

If we test 10000 people and find 10 positive, the true incidence of the disease could be anywhere between 0.05% and 0.2%.

Only if we test 100000 people (a truly massive random sample) and find 100 positive, then the true incidence lies between 0.08% and 0.12%, an acceptable range.

I admit that we may not be testing enough. A simple rule of thumb is that anyone with more than a 5% prior probability of having the disease needs to be tested. How we determine this prior probability is again dependent on some rules of thumb.

I’ll close by saying that we should NOT be doing random testing. That would be unethical on multiple counts.