## Risk and data

A while back a group of <a large number of scientists> wrote an open letter to the Prime Minister demanding greater data sharing with them. I must say that the letter is written in academic language and the effort to understand it was too much, but in the interest of fairness I’ll put a screenshot that was posted on twitter here.

I don’t know about this clinical and academic data. However, the holding back of one kind of data, in my opinion, has massively (and negatively) impacted people’s mental health and risk calculations.

This is data on mortality and risk. The kind of questions that I expect government data to have answered was:

1. If I get covid-19 (now in the second wave), what is the likelihood that I will die?
2. If my oxygen level drops to 90 (>= 94 is “normal”), what is the likelihood that I will die?
3. If I go to hospital, what is the likelihood I will die?
4. If I go to ICU what is the likelihood I will die?
5. What is the likelihood of a teenager who contracts the virus (and is otherwise in good health) dying of the virus?

And so on. Simple risk-based questions whose answers can help people calibrate their lives and take calculated enough risks to get on with it without putting themselves and their loved ones at risk.

Instead, what we find from official sources are nothing but aggregates. Total numbers of people infected, dead, recovered and so on. And it is impossible to infer answers to the “risk questions” based no that.

And who fill in the gaps? Media of course.

I must have discussed “spectacularness bias” on this blog several times before. Basically the idea is that for something to be news, it needs to carry information. And an event carries information if it occurs despite having a low prior probability (or not occurring despite a high prior probability). As I put it in my lectures, “‘dog bites man’ is not news. ‘man bits dog’ is news”.

So when we rely on media reports to fill in our gaps in our risk systems, we end up taking all the wrong kinds of lessons. We learn that one seventeen year old boy died of covid despite being otherwise healthy. In the absence of other information, we assume that teenagers are under grave risk from the disease.

Similarly, cases of children looking for ICU beds get forwarded far more than cases of old people looking for ICU beds. In the absence of risk information, we assume that the situation must be grave among children.

Old people dying from covid goes unreported (unless the person was famous in some way or the other), since the information content in that is low. Young people dying gets amplified.

Based on all the reports that we see in the papers and other media (including social media), we get an entirely warped sense of what the risk profile of the disease is. And panic. When we panic, our health gets worse.

Oh, and I haven’t even spoken about bad risk reporting in the media. I saw a report in the Times of India this morning (unable to find a link to it) that said that “young are facing higher mortality in this wave”. Basically the story said that people under 60 account for a far higher proportion of deaths in the second wave than in the first.

Now there are two problems with that story.

1. A large proportion of over 60s in India are vaccinated, so mortality is likely to be lower in this cohort.
2. What we need is the likelihood of a person under 60 dying upon contracting covid. NOT the proportion of deaths accounted for by under 60s. This is the classic “averaging along the wrong axis” that they unleash upon you in the first test of any statistics course.

Anyway, so what kind of data would have helped?

1. Age profile of people testing positive, preferably state wise (any finer will be noise)
2. Age profile of people dying of covid-19, again state wise

I’m sure the government collects this data. Just that they’re not used to releasing this kind fo data, so we’re not getting it. And so we have to rely on the media and its spectacularness bias to get our information. And so we panic.

PS: By no means am I stating that covid-19 is not a risk. All I am stating is that the information we have been given doesn’t help us make good risk decisions

## Uncertain Rewards

A couple of months back, I read Nir Eyal’s Hooked. I didn’t particularly get hooked to the book – it’s one of those books that should have been a blogpost (or maybe a longform article). However, as part of the “Hooked model” that forms the core of the book, the author talks about the importance of “uncertain rewards”.

The basic idea is that it is easier to get addicted to something when the rewards from it are uncertain. If the rewards are certain, then irrespective of how large they are, there is a chance that you might get bored of them. Uncertainty, on the other hand, makes you curious. It provides you “information” each time you “play the game”. And you in the quest for new information (remember that entropy is information?), you keep playing. And you get hooked.

This plays out in various ways. Alcohol and drugs, for example, sometimes offer “good trips”, and sometimes “bad trips”. The memory of the good trips is the reason why you keep at it, even if you occasionally have bad trips. The uncertain rewards hook you.

It’s the same with social media. This weekend, so far, I’ve had a largely good experience on Twitter. However, last weekend on the platform was a disaster. I’d gotten quickly depressed and stopped. So why did I get back on to twitter this weekend when last weekend was bad? Because of an earlier weekend when it had provided a set of good conversations.

Even last weekend, when I started having a “bad trip” on Twitter, I kept at it, thinking the longer I play the better the chances of having a good trip. Ultimately I just ruined my weekend.

Uncertain rewards are also why, (especially) when we are young, we tolerate abusive romantic partners. Partners who treat you well all the time are boring. And there is no excitement. Abusive partners, on the other hand, treat you like a king/queen at times, and like shit at other times. The extent of the highs and lows means that you get hooked to them. It possibly takes a certain degree of abuse for you to realise that a “steady partner who treats you well” makes for a better long term partner.

Is there a solution to this? I don’t think so. As we learn in either thermodynamics or information theory, entropy or randomness is equal to information. And because we have evolved to learn and get more information, we crave entropy. And so we crave the experiences that give us a lot of entropy, even it that means the occasional bad trip.

Finally, I realise that uncertain rewards are also the reason why religion is addictive. One conversation I used to have a lot with my late mother was when I would say, “why do you keep praying when your prayers weren’t answered the last time?”. And she would quote another time when her prayers WERE answered. It is this uncertain reward of answers to prayers (which, in my opinion, is sheer randomness) that keeps religion “interesting”. And makes it addictive.

## Confusing with complications

I’m reading this awesome article by Srinivas Bhogle (with Rajeeva Karandikar) on election forecasting. To be fair, not much of the article is new to me – it’s just a far more readable version of Karandikar’s seminal presentation on the topic made at IIT Kanpur all those years back.

However, as with all good retellings, this story also has some nice tidbits. This one has to do with “index of opposition unity”. The voice here is Bhogle’s:

It is easy to understand why the IOU becomes so critical in such situations. But, and here’s the rub, the exact mathematical formula connecting IOU to the seat count prediction is not easy to find. I searched through the big and small print of The Verdict by Dorab Sopariwala and Prannoy Roy, but the formula remained elusive.

Rajeeva suggests that it was likely based on simple heuristics: something like ‘if the IOU is less than 25%, give the first-placed party 75% of the seats.’ It may also have involved intelligent tweaking based on current survey data, historical data, informal feedback, expert opinion, gut feeling, and so on.

I first came across the IOU in Prannoy Roy and Dorab Sopariwala’s book. The way they had presented in the book, it seemed like it is a “major concept”. It seems, like I did, Bhogle also looked through the book trying to find a precise formula, and failed to do so.

And then Karandikar’s insight above is crucial – that the IOU may not be a precise mathematical formula, but just an intelligent set of heuristics, involving intelligent tweaking.

Sometimes putting a fancy name (or, even better, an acronym) on something can help lend credibility to the concept. For example, IOU is something that has been championed by Roy and Sopariwala for years, and they have done so to a level where it has become a self-fulfilling prophecy, and a respected scientist for Bhogle has gone searching for its formula!

Also, sometimes, telling people that you “used an intelligent heuristic” to come up with a conclusion can lead you to be taken less seriously. Put on a fancy name (even if it is something that you have yourself come up with), and the game changes. You suddenly start to be taken more seriously, like Ganesha assumed when he started sending fan mail under the name “YG Rao”.

And like they say in The Usual Suspects, sometimes the greatest trick that the devil ever pulled was to convince you that he exists. It is the same with “concepts” such as IOU – you THINK they must be sound because they come with a fancy name, when all that they apeear to represent is a set of fancy heuristics.

I must say this is excellent marketing.

## ISAs and Power Laws

There are a number of professions where incomes are distributed according to a power law. The most successful people in the professions corner a very large share of the income that people in the profession make, and unless you reach that very high level of success, you might even struggle to make a living wage.

Professions of this nature include the arts (movies, music, drama, standup comedy, painting, sculpture, etc.), sports, writing and entrepreneurship. The thing with such professions is that it needs some degree of “socialism” – if people are left to their own devices, then the 99% confidence payoffs will mean that few people will enter the profession, and when fewer people enter the profession, the overall quality of the profession goes down.

So what is required in this case is some sort of a safety net – people who are reasonably competent at the profession get paid a sort of regular basic income (could either be one-time, periodic or output-based) by “investors” in exchange for a cut of the upside. And this, for a talented but struggling beginner, is usually a good deal – they are assured a basic income to pursue what they love and think they are good at, and anything they have to pay in return is only probabilistic – contingent upon a heavy degree of success.

And in order for this kind of safety net to work, it is important that the investment be of the nature of “equity” rather than “debt” – the extreme power law nature of these professions is that only a small proportion of the people who get the safety net will be able to pay back, and those that are able to pay back will be able to pay disproportionately large amounts.

Entrepreneurship and film acting have sort of done well in terms of providing these safety net. Entrepreneurs get venture capital investment, which allows them to fund their business and take (nominal) salaries, while working on the thing they hope to make it big in. The venture capitalists make money even when a small proportion of their investments don’t fail.

The model in acting is a little different- studios hire actors on long term contracts at negotiated salaries. These salaries give actors the safety net to continue in the profession. And in case the actors become popular, the studios cash out essentially by “encashing the option” of using the actor at the pre-negotiated rate for the duration of the contract.

There are other examples of these safety nets as well – artist studios pay their artists a basic wage, in exchange for a cut on the sale of their paintings. However, the model is not as popular as it seems.

For sportspersons, for example, apart from things like the Ranji Trophy increasing match fees in a big way in the late noughties, this kind of a safety net has been absent. The studio model in acting hasn’t held on. Writers get advances but that doesn’t represent much of a “living wage”.

The good news is that this is changing. Investment in athletes in exchange for a cut of future earnings is gaining traction. And now we have this deal ($): Taxes will cut into his new 14-year agreement with the Padres, of course. But Tatis also must pay off a previous obligation, a deal he made during the 2017-18 offseason, when he was turning 19 years old and preparing for his first full season at Double A. It was then that Tatis entered into a contract with Big League Advance (BLA), a company that offers select minor leaguers upfront payments in exchange for a percentage of their future earnings in Major League Baseball. Neither Tatis nor BLA has revealed the exact percentage he owes the company. The company’s president and CEO, former major-league pitcher Michael Schwimer, told The Athletic in April 2018 that BLA uses a proprietary algorithm to value every player in the minors. Players who receive offers can accept a base-level payout in return for 1 percent of their earnings, with the chance to receive greater incremental payouts and pay back a maximum of 10 percent. If a player never reaches the majors, he keeps the cash advance, with no obligation to pay it back. This is an awesome thing. For a struggling potential sportsperson, a minor investment (in exchange for equity) can provide a huge boost in their chances of making it – hiring coaches, for example, or eating better food, or living more comfortably. While the media attention will go to the small proportion of investments that do pay off (like how tech media gives disproportionate coverage, and quite rightly so, to startups that do well), arrangements like this mean that more people will play the sport, and the overall standard in the sport will improve. We need to see if such arrangements start making a mark in the rest of the arts and writing as well. Oh, and much has been made of income sharing agreements for professional colleges and “tuition centres”. I’m not sure that is the right model there – the thing is that if you are studying to be a software engineer, your payoffs don’t follow a power law. Yes, if you are successful, you make a few orders of magnitude more money than the less successful ones, but even an average software engineer can expect to make a fairly decent income. From that perspective, selling equity in your future earnings to get paid to study engineering is not a great idea, and can lead to adverse selection on the part of the candidates (the better ones will prefer to get funding through debt, which their average salaries can help pay off). In that sense I prefer what the likes of MountBlue are doing, where the “training fees” get paid off by simply working for the company for a certain period of time. ## Monetising volatility I’m catching up on old newsletters now – a combination of job and taking my email off what is now my daughter’s iPad means I have a considerable backlog – and I found this gem in Matt Levine’s newsletter from two weeks back ($; Bloomberg).

“it comes from monetizing volatility, that great yet under-appreciated resource.”

He is talking about equity derivatives, and says that this is “not such a good explanation”. While it may not be such a good explanation when it comes to equity derivatives itself, I think it has tremendous potential outside of finance.

I’m reminded of the first time I was working in the logistics industry (back in 2007). I had what I had thought was a stellar idea, which was basically based on monetising volatility, but given that I was in a company full of logistics and technology and operations research people, and no other derivatives people, I had a hard time convincing anyone of that idea.

My way of “monetising volatility” was rather simple – charge people cancellation fees. In the part of the logistics industry I was working in back then, this was (surprisingly, to me) a particularly novel idea. So how does cancellation fees equate to monetising volatility?

Again it’s due to “unbundling”. Let’s say you purchase a train ticket using advance reservation. You are basically buying two things – the OPTION to travel on that particular day using that particular train, sitting on that particular seat, and the cost of the travel itself.

The genius of the airline industry following the deregulation in the US in the 1980s was that these two costs could be separated. The genius was that charging separately for the travel itself and the option to travel, you can offer the travel itself at a much lower price. Think of the cancellation charge as as the “option premium” for exercising the option to travel.

And you can come up with options with different strike prices, and depending upon the strike price, the value of the option itself changes. Since it is the option to travel, it is like a call option, and so higher the strike price (the price you pay for the travel itself), the lower the price of the option.

This way, you can come up with a repertoire of strike-option combinations – the more you’re willing to pay for cancellation (option premium), the lower the price of the travel itself will be. This is why, for example, the cheapest airline tickets are those that come with close to zero refund on cancellation (though I’ve argued that bringing refunds all the way to zero is not a good idea).

Since there is uncertainty in whether you can travel at all (there are zillions of reasons why you might want to “cancel tickets”), this is basically about monetising this uncertainty or (in finance terms) “monetising volatility”. Rather than the old (regulated) world where cancellation fees were low and travel charges were high (option itself was not monetised), monetising the options (which is basically a price on volatility) meant that airlines could make more money, AND customers could travel cheaper.

It’s like money was being created out of thin air. And that was because we monetised volatility.

I had the same idea for another part of the business, but unfortunately we couldn’t monetise that. My idea was simple – if you charge cancellation fees, our demand will become more predictable (since people won’t chumma book), and this means we will be able to offer a discount. And offering a discount would mean more people would buy this more predictable demand, and in the immortal jargon of Silicon Valley, “a flywheel would be set in motion”.

The idea didn’t fly. Maybe I was too junior. Maybe people were suspicious of my brief background in banking. Maybe most people around me had “too much domain knowledge”. So the idea of charging for cancellation in an industry that traditionally didn’t charge for cancellation didn’t fly at all.

Anyway all of that is history.

Now that I’m back in the industry, it remains to be seen if I can come up with such “brilliant” ideas again.

## Uncertainty and Anxiety

A lot of parenting books talk about the value of consistency in parenting – when you are consistent with your approach with something, the theory goes, the child knows what to expect, and so is less anxious about what will happen.

It is not just about children – when something is more deterministic, you can “take it for granted” more. And that means less anxiety about it.

From another realm, prices of options always have “positive vega” – the higher the market volatility, the more the price of the option. Thinking about it another way, the more the uncertainty, the more people are willing to pay to hedge against it. In other words, higher uncertainty means more anxiety.

However, sometimes the equation can get flipped. Let us take the case of water supply in my apartment. We have both a tap water connection and a borewell, so historically, water supply has been fairly consistent. For the longest time, we didn’t bother thinking about the pressure of water in the taps.

And then one day in the beginning of this year the water suddenly stopped. We had an inkling of it that morning as the water in the taps inexplicably slowed down, and so stored a couple of buckets until it ground to a complete halt later that day.

It turned out that our water pump, which is way deep inside the earth (near the water table) was broken, so it took a day to fix.

Following that, we have become more cognisant of the water pressure in the pipes. If the water pressure goes down for a bit, the memory of the day when the motor conked is fresh, and we start worrying that the water will suddenly stop. I’ve panicked at least a couple of times wondering if the water will stop.

However, after this happened a few times over the last few months I’m more comfortable. I now know that fluctuation of water pressure in the tap is variable. When I’m showering at the same time as my downstairs neighbour (I’m guessing), the water pressure will be lower. Sometimes the level of water in the tank is just above the level required for the pump to switch on. Then again the pressure is lower. And so forth.

In other words, observing a moderate level of uncertainty has actually made me more comfortable now and reduced my anxiety – within some limits, I know that some fluctuation is “normal”.  This uncertainty is more than what I observed earlier, so in other words, increased (perceived) uncertainty has actually reduced anxiety.

One way I think of it is in terms of hidden risks – when you see moderate fluctuations, you know that fluctuations exist and that you don’t need to get stressed around them. So your anxiety is lower. However, if you’ve gone a very long time with no fluctuation at all, then you are concerned that there are hidden risks that you have not experienced yet.

So when the water pressure in the taps has been completely consistent, then any deviation is a very strong (Bayesian) sign that something is wrong. And that increases anxiety.

## Shooting, investing and the hot hand

A couple of years back I got introduced to “Stumbling and Mumbling“, a blog written by Chris Dillow, who was described to me as a “Marxist investment banker”. I don’t agree with a lot of the stuff in his blog, but it is all very thoughtful.

He appears to be an Arsenal fan, and in his latest post, he talks about “what we can learn from football“. In that, he writes:

These might seem harmless mistakes when confined to talking about football. But they have analogues in expensive mistakes. The hot-hand fallacy leads investors to pile into unit trusts with good recent performance (pdf) – which costs them money as the performance proves unsustainable. Over-reaction leads them to buy stocks at the top of the market and sell at the bottom. Failing to see that low probabilities compound to give us a high one helps explain why so many projects run over time and budget. And so on.

Now, the hot hand fallacy has been a hard problem in statistics for a few years now. Essentially, the intuitive belief in basketball is that someone who has scored a few baskets is more likely to be successful in his next basket (basically, the player is on a “hot hand”).

It all started with a seminal paper by Amos Tversky et al in the 1980s, that used (the then limited) data to show that the hot hand is a fallacy. Then, more recently, Miller and Sanjurjo took another look at the problem and, with far better data at hand, found that the hot hand is actually NOT a fallacy.

There is a nice podcast on The Art of Manliness, where Ben Cohen, who has written a book about hot hands, spoke about the research around it. In any case, there are very valid reasons as to why hot hands exist.

Yet, Dillow is right – while hot hands might exist in something like basketball shooting, it doesn’t in something like investing. This has to do with how much “control” the person in question has. Let me switch fields completely now and quote a paragraph from Venkatesh Guru Rao‘s “The Art Of Gig” newsletter:

As an example, take conducting a workshop versus executing a trade based on some information. A significant part of the returns from a workshop depend on the workshop itself being good or bad. For a trade on the other hand, the returns are good or bad depending on how the world actually behaves. You might have set up a technically perfect trade, but lose because the world does something else. Or you might have set up a sloppy trade, but the world does something that makes it a winning move anyway.

This is from the latest edition, which is paid. Don’t worry if you aren’t a subscriber. The above paragraph I’ve quoted is sufficient for the purpose of this blogpost.

If you are in the business of offering workshops, or shooting baskets, the outcome of the next workshop or basket depends largely upon your own skill. There is randomness, yes, but this randomness is not very large, and the impact of your own effort on the result is large.

In case of investing, however, the effect of the randomness is very large. As VGR writes, “For a trade on the other hand, the returns are good or bad depending on how the world actually behaves”.

So if you are in a hot hand when it comes to investing, it means that “the world behaved in a way that was consistent with your trade” several times in a row. And that the world has behaved according to your trade several times in a row makes it no more likely that the world will behave according to your trade next time.

If, on the other hand, you are on a hot hand in shooting baskets or delivering lectures, then it is likely that this hot hand is because you are performing well. And because you are performing well, the likelihood of you performing well on the next turn is also higher. And so the hot hand theory holds.

So yes, hot hands work, but only in the context “with a high R Square”, where the impact of the doer’s performance is large compared to the outcome. In high randomness regimes, such as gambling or trading, the hot hand doesn’t matter.

## What is the Case Fatality Rate of Covid-19 in India?

The economist in me will give a very simple answer to that question – it depends. It depends on how long you think people will take from onset of the disease to die.

The modeller in me extended the argument that the economist in me made, and built a rather complicated model. This involved smoothing, assumptions on probability distributions, long mathematical derivations and (for good measure) regressions.. And out of all that came this graph, with the assumption that the average person who dies of covid-19 dies 20 days after the thing is detected.

Yes, there is a wide variation across the country. Given that the disease is the same and the treatment for most people diseased is pretty much the same (lots of rest, lots of water, etc), it is weird that the case fatality rate varies by so much across Indian states. There is only one explanation – assuming that deaths can’t be faked or miscounted (covid deaths attributed to other reasons or vice versa), the problem is in the “denominator” – the number of confirmed cases.

What the variation here tells us is that in states towards the top of this graph, we are likely not detecting most of the positive cases (serious cases will get themselves tested anyway, and get hospitalised, and perhaps die. It’s the less serious cases that can “slip”). Taking a state low down below in this graph as a “good tester” (say Andhra Pradesh), we can try and estimate what the extent of under-detection of cases in each state is.

Based on state-wise case tallies as of now (might be some error since some states might have reported today’s number and some mgiht not have), here are my predictions on how many actual number of confirmed cases there are per state, based on our calculations of case fatality rate.

Yeah, Maharashtra alone should have crossed a million caess based on the number of people who have died there!

Now let’s get to the maths. It’s messy. First we look at the number of confirmed cases per day and number of deaths per day per state (data from here). Then we smooth the data and take 7-day trailing moving averages. This is to get rid of any reporting pile-ups.

Now comes the probability assumption – we assume that a proportion $p$ of all the confirmed cases will die. We assume an average number of days ($N$) to death for people who are supposed to die (let’s call them Romeos?). They all won’t pop off exactly $N$ days after we detect their infection. Let’s say a proportion $\lambda$ dies each day. Of everyone who is infected, supposed to die and not yet dead, a proportion $\lambda$ will die each day.

My maths has become rather rusty over the years but a derivation I made shows that $\lambda = \frac{1}{N}$. So if people are supposed to die in an average of 20 days, $\frac{1}{20}$ will die today, $\frac{19}{20}\frac{1}{20}$ will die tomorrow. And so on.

So people who die today could be people who were detected with the infection yesterday, or the day before, or the day before day before (isn’t it weird that English doesn’t a word for this?) or … Now, based on how many cases were detected on each day, and our assumption of $p$ (let’s assume a value first. We can derive it back later), we can know how many people who were found sick $k$ days back are going to die today. Do this for all $k$, and you can model how many people will die today.

The equation will look something like this. Assume $d_t$ is the number of people who die on day $t$ and $n_t$ is the number of cases confirmed on day $t$. We get

$d_t = p (\lambda n_{t-1} + (1-\lambda) \lambda n_{t-2} + (1-\lambda)^2 \lambda n_{t-3} + ... )$

Now, all these $n$s are known. $d_t$ is known. $\lambda$ comes from our assumption of how long people will, on average, take to die once their infection has been detected. So in the above equation, everything except $p$ is known.

And we have this data for multiple days. We know the left hand side. We know the value in brackets on the right hand side. All we need to do is to find $p$, which I did using a simple regression.

And I did this for each state – take the number of confirmed cases on each day, the number of deaths on each day and your assumption on average number of days after detection that a person dies. And you can calculate $p$, which is the case fatality rate. The true proportion of cases that are resulting in deaths.

This produced the first graph that I’ve presented above, for the assumption that a person, should he die, dies on an average 20 days after the infection is detected.

So what is India’s case fatality rate? While the first graph says it’s 5.8%, the variations by state suggest that it’s a mild case detection issue, so the true case fatality rate is likely far lower. From doing my daily updates on Twitter, I’ve come to trust Andhra Pradesh as a state that is testing well, so if we assume they’ve found all their active cases, we use that as a base and arrive at the second graph in terms of the true number of cases in each state.

PS: It’s common to just divide the number of deaths so far by number of cases so far, but that is an inaccurate measure, since it doesn’t take into account the vintage of cases. Dividing deaths by number of cases as of a fixed point of time in the past is also inaccurate since it doesn’t take into account randomness (on when a Romeo might die).

Anyway, here is my code, for what it’s worth.

deathRate <- function(covid, avgDays) {
covid %>%
mutate(Date=as.Date(Date, '%d-%b-%y')) %>%
gather(State, Number, -Date, -Status) %>%
arrange(State, Date) ->
cov1

# Need to smooth everything by 7 days
cov1 %>%
arrange(State, Date) %>%
group_by(State) %>%
mutate(
TotalConfirmed=cumsum(Confirmed),
TotalDeceased=cumsum(Deceased),
ConfirmedMA=(TotalConfirmed-lag(TotalConfirmed, 7))/7,
DeceasedMA=(TotalDeceased-lag(TotalDeceased, 7))/ 7
) %>%
ungroup() %>%
filter(!is.na(ConfirmedMA)) %>%
select(State, Date, Deceased=DeceasedMA, Confirmed=ConfirmedMA) ->
cov2

cov2 %>%
select(DeathDate=Date, State, Deceased) %>%
inner_join(
cov2 %>%
select(ConfirmDate=Date, State, Confirmed) %>%
crossing(Delay=1:100) %>%
mutate(DeathDate=ConfirmDate+Delay),
by = c("DeathDate", "State")
) %>%
filter(DeathDate > ConfirmDate) %>%
arrange(State, desc(DeathDate), desc(ConfirmDate)) %>%
mutate(
Lambda=1/avgDays,
) %>%
filter(Deceased > 0) %>%
group_by(State, DeathDate, Deceased) %>%
ungroup() %>%
summary() %>%
broom::tidy() %>%
select(estimate) %>%
first() %>%
return()
}

## Games of luck and skill

My good friend Anuroop has two hobbies – poker and wildlife photography. And when we invited him to NED Talks some 5 years ago, he decided to combine these two topics into the talk, by speaking about “why wildlife photography is like poker” (or the other way round, I’ve forgotten).

I neither do wildlife photography nor play poker so I hadn’t been able to appreciate his talk in full when he delivered it. However, our trip to Jungle Lodges River Tern Resort (at Bhadra Wildlife Sanctuary) earlier this year demonstrated to me why poker and wildlife photography are similar – they are both “games of luck AND skill”.

One debate that keeps coming up in Indian legal circles is whether a particular card game (poker, rummy, etc.) is a “game of luck” or a “game of skill”. While this might sound esoteric, it is a rather important matter – games of skill don’t need any permission from any authority, while games of luck are banned to different extents by different states (they are seen as being similar to “gambling”, and the moralistic Indian states don’t want to permit that).

Many times in the recent past, courts in India have declared poker and rummy to be “games of skill“, which means “authorities” cannot disrupt any such games. Still, for different reasons, they remain effectively illegal in certain states.

In any case, what makes games like poker interesting is that they combine skill and luck. This is also what makes games like this addictive. That there is skill involved means that you get constantly better over time, and the more you play, the greater the likelihood that you will win (ok it doesn’t increase at the same rate for everyone, and there is occasional regression as well).

If it were a pure game of skill, then things would get boring, since in a game of skill the better player wins every single time. So unless you get a “sparring partner” of approximately your own level, nobody will want to play with you (this is one difficulty with games like chess).

With luck involved, however, the odds change. It is possible to beat someone much better (on average) than you, or lose to someone much worse (on average). In other words, if you are designing an Elo rating system for a game like poker, you need to change players’ ratings by very little after each game (compared to a game of pure skill such as chess).

Because there is luck involved, there is “greater information content” in the result of each game (remember from information theory that a perfectly fair coin has the most information content (1 bit) among all coins). And this makes the game more fun to play. And the better player is seen as better only when lots of games are played. And so people want to play more.

It is the same with wildlife photography. It is a game of skill because as you do more and more of it, you know where to look for the tigers and leopards (and ospreys and wild dogs). You know where and how long you should wait to maximise your chances of a “sighting”. The more you do it, the better you become at photography as well.

And it is a game of luck because despite your best laid plans, there is a huge amount of luck involved. Just on the day you set up, the tiger might decide to take another path to the river. The osprey might decide on a siesta that is a little bit longer than usual.

At the entrance of JLR River Tern Lodge, there is a board that shows what animals were “sighted” during each safari in the preceding one week. Each day, the resort organises two safaris, one each in the morning and afternoon, and some of them are by boat and some by jeep.

I remember trying to study the boards and try and divine patterns to decide when we should go by boat and when by jeep (on the second day of our stay there, we were the “longest staying guests” and thus given the choice of safari). One the first evening, in our jeep safari, we saw a herd of elephants. And a herd of gaur. And lots of birds. And a dead deer.

That we had “missed out” on tigers and leopards meant that we wanted to do it again. If what we saw depended solely on the skill of the naturalist and the driver who accompanied us, we would not have been excited to go into the forest again.

However, the element of luck meant that we wanted to just keep going, and going.

Games of pure luck or pure skill can get boring after a while. However, when both luck and skill get involved, they can really really get addictive. Now I fully appreciate Anuroop’s NED Talk.

## I don’t know which 80%

Legendary retailer John Wanamaker (who pioneered fixed price stores in the mid 1800s) is supposed to have said that “half of all advertising is useless. The trouble is I don’t know which half”.

I was playing around with my twitter archive data, and was looking at the distribution of retweets and favourites across all my tweets. To say that it follows a power law is an understatement.

Before this blog post triggers an automated tweet, I have 63793 tweets, of which 59,275 (93%) have not had a single retweet. 51,717 (81%) have not had a single person liking them. And 50, 165 (79%) of all my tweets have not had a single retweet or a favourite.

In other words, nearly 80% of all my tweets had absolutely no impact on the world. They might as well have not existed. Which means that I should cut down my time spent tweeting down to a fifth. Just that, to paraphrase Wanamaker, I don’t know which four fifths I should eliminate!

There is some good news, though. Over time, the proportion of my tweets that has no impact (in terms of retweets or favourites – the twitter dump doesn’t give me the number of replies to a tweet) has been falling consistently.

Right now, this month, the score is around 33% or so. So even though the proportion of my useless tweets have been dropping over time, even now one in every tweets that I tweet has zero impact.

My “most impactful tweet” itself account for 17% of all retweets that I’ve got. Here I look at what proportion of tweets have accounted for what proportion of “reactions” (reactions for each tweet is defined as the sum of number of retweets and number of favourites. I understand that the same person might have been retweeted and favourited something, but I ignore that bit now).

Notice how extreme the graph is. 0.7% of all my tweets have accounted for 50% of all retweets and likes! 10% of all my tweets have accounted for 90% of all retweets and likes.

Even if I look only at recent data, it doesn’t change shape that much – starting from January 2019, 0.8% of my tweets have accounted for 50% of all retweets and likes.

This, I guess, is the fundamental nature of social media. The impact of a particular tweet follows a power law with a very small exponent (meaning highly unequal).

What this also means is that anyone can go viral. Anyone from go from zero to hero in a single day. It is very hard to predict who is going to be a social media sensation some day.

So it’s okay that 80% of my tweets have no traction. I got one blockbuster, and who knows – I might have another some day. I guess such blockbusters is what we live for.