code – Pertinent Observations

The economist in me will give a very simple answer to that question – it depends. It depends on how long you think people will take from onset of the disease to die.

The modeller in me extended the argument that the economist in me made, and built a rather complicated model. This involved smoothing, assumptions on probability distributions, long mathematical derivations and (for good measure) regressions.. And out of all that came this graph, with the assumption that the average person who dies of covid-19 dies 20 days after the thing is detected.

Yes, there is a wide variation across the country. Given that the disease is the same and the treatment for most people diseased is pretty much the same (lots of rest, lots of water, etc), it is weird that the case fatality rate varies by so much across Indian states. There is only one explanation – assuming that deaths can’t be faked or miscounted (covid deaths attributed to other reasons or vice versa), the problem is in the “denominator” – the number of confirmed cases.

What the variation here tells us is that in states towards the top of this graph, we are likely not detecting most of the positive cases (serious cases will get themselves tested anyway, and get hospitalised, and perhaps die. It’s the less serious cases that can “slip”). Taking a state low down below in this graph as a “good tester” (say Andhra Pradesh), we can try and estimate what the extent of under-detection of cases in each state is.

Based on state-wise case tallies as of now (might be some error since some states might have reported today’s number and some mgiht not have), here are my predictions on how many actual number of confirmed cases there are per state, based on our calculations of case fatality rate.

Yeah, Maharashtra alone should have crossed a million caess based on the number of people who have died there!

Now let’s get to the maths. It’s messy. First we look at the number of confirmed cases per day and number of deaths per day per state (data from here). Then we smooth the data and take 7-day trailing moving averages. This is to get rid of any reporting pile-ups.

Now comes the probability assumption – we assume that a proportion $p$ of all the confirmed cases will die. We assume an average number of days ( $N$ ) to death for people who are supposed to die (let’s call them Romeos?). They all won’t pop off exactly $N$ days after we detect their infection. Let’s say a proportion $\lambda$ dies each day. Of everyone who is infected, supposed to die and not yet dead, a proportion $\lambda$ will die each day.

My maths has become rather rusty over the years but a derivation I made shows that $\lambda = \frac{1}{N}$ . So if people are supposed to die in an average of 20 days, $\frac{1}{20}$ will die today, $\frac{19}{20}\frac{1}{20}$ will die tomorrow. And so on.

So people who die today could be people who were detected with the infection yesterday, or the day before, or the day before day before (isn’t it weird that English doesn’t a word for this?) or … Now, based on how many cases were detected on each day, and our assumption of $p$ (let’s assume a value first. We can derive it back later), we can know how many people who were found sick $k$ days back are going to die today. Do this for all $k$ , and you can model how many people will die today.

The equation will look something like this. Assume $d_t$ is the number of people who die on day $t$ and $n_t$ is the number of cases confirmed on day $t$ . We get

$d_t = p (\lambda n_{t-1} + (1-\lambda) \lambda n_{t-2} + (1-\lambda)^2 \lambda n_{t-3} + ... )$

Now, all these $n$ s are known. $d_t$ is known. $\lambda$ comes from our assumption of how long people will, on average, take to die once their infection has been detected. So in the above equation, everything except $p$ is known.

And we have this data for multiple days. We know the left hand side. We know the value in brackets on the right hand side. All we need to do is to find $p$ , which I did using a simple regression.

And I did this for each state – take the number of confirmed cases on each day, the number of deaths on each day and your assumption on average number of days after detection that a person dies. And you can calculate $p$ , which is the case fatality rate. The true proportion of cases that are resulting in deaths.

This produced the first graph that I’ve presented above, for the assumption that a person, should he die, dies on an average 20 days after the infection is detected.

So what is India’s case fatality rate? While the first graph says it’s 5.8%, the variations by state suggest that it’s a mild case detection issue, so the true case fatality rate is likely far lower. From doing my daily updates on Twitter, I’ve come to trust Andhra Pradesh as a state that is testing well, so if we assume they’ve found all their active cases, we use that as a base and arrive at the second graph in terms of the true number of cases in each state.

PS: It’s common to just divide the number of deaths so far by number of cases so far, but that is an inaccurate measure, since it doesn’t take into account the vintage of cases. Dividing deaths by number of cases as of a fixed point of time in the past is also inaccurate since it doesn’t take into account randomness (on when a Romeo might die).

Anyway, here is my code, for what it’s worth.

deathRate <- function(covid, avgDays) {
covid %>%
mutate(Date=as.Date(Date, '%d-%b-%y')) %>%
gather(State, Number, -Date, -Status) %>%
spread(Status, Number) %>%
arrange(State, Date) -> 
cov1

# Need to smooth everything by 7 days 
cov1 %>%
arrange(State, Date) %>%
group_by(State) %>%
mutate(
TotalConfirmed=cumsum(Confirmed),
TotalDeceased=cumsum(Deceased),
ConfirmedMA=(TotalConfirmed-lag(TotalConfirmed, 7))/7,
DeceasedMA=(TotalDeceased-lag(TotalDeceased, 7))/ 7
) %>%
ungroup() %>%
filter(!is.na(ConfirmedMA)) %>%
select(State, Date, Deceased=DeceasedMA, Confirmed=ConfirmedMA) ->
cov2

cov2 %>%
select(DeathDate=Date, State, Deceased) %>%
inner_join(
cov2 %>%
select(ConfirmDate=Date, State, Confirmed) %>%
crossing(Delay=1:100) %>%
mutate(DeathDate=ConfirmDate+Delay), 
by = c("DeathDate", "State")
) %>%
filter(DeathDate > ConfirmDate) %>%
arrange(State, desc(DeathDate), desc(ConfirmDate)) %>%
mutate(
Lambda=1/avgDays,
Adjusted=Confirmed * Lambda * (1-Lambda)^(Delay-1)
) %>%
filter(Deceased > 0) %>%
group_by(State, DeathDate, Deceased) %>%
summarise(Adjusted=sum(Adjusted)) %>%
ungroup() %>%
lm(Deceased~Adjusted-1, data=.) %>%
summary() %>%
broom::tidy() %>%
select(estimate) %>%
first() %>%
return()
}

Recently I started re-reading Vikram Chandra (the novelist and Berkeley academic)’s book “Mirrored Mind”, which has been published in the US as “Geek Sublime”. I hadn’t read it earlier – I had only read the Kindle sample and then discarded it, and I recently decided to pick it up from where I had left off.

In fact, that was hard to do, so I decided to start from the beginning once again, and so went through the introduction and preface and acknowledgements and all such before diving into the book again. This time I liked it better (not that I hadn’t liked it the first time round), and so decided to buy the full book. But somewhere midway through the full book, I lost enthu, and didn’t feel like reading further. My Kindle lay unused for a few days, for the “loaded” book on that was this one, and there was absolutely no enthu to continue reading that. Finally I gave up and moved on to another book.So one point that Vikram Chandra makes in the introduction to the book is that he initially planned to make it a Kindle single, but then decided, upon the urging of his wife and others, to make it into a complete book on coding and poetry. While the intent of writing a full book is no doubt well-placed, the result doesn’t really match up.

For when you try and turn a Kindle single into a full book, you try to add words and pages, and for that reason you write things that aren’t organically attached to the rest of the book. You want to add content, and depth, but instead you end up simply adding empty words – those that you could have done without, and chapters which are disconnected from the rest of the book.

And so it is the case with Vikram Chandra’s Mirrored Mind. There is a whole chapter, for example, on the sociology of the Indian software industry, which is clearly “out of syllabus” for the otherwise excellent novelist, programmer and creative writer Vikram Chandra. He goes into long expositions on the role of women in the Indian software industry, the history of the industry, etc. which are inherently interesting stories, but not when told by Chandra, who is clearly not in his zone while writing that chapter.

And then there is the chapter on Sanskrit poetry, which is anything but crisp, and so verbose that it is extremely hard to get through. There is nothing about code in the chapter, and it is very hard to cut through the verbosity and discern any references to the structure of poetry, and that lays waste to the chapter. It was while reading this chapter that I simply couldn’t proceed, and abandoned the book.

This is by no means a comparison but I’ve gone down this path, too. I’ve written so many blog posts on the taxi industry, and especially on the pricing aspects, that I thought it might make sense to put them all together and convert them into a Kindle Single. But then, as I started going through my posts and began to piece them together during my holiday in Barcelona earlier this year, I got greedy, and I thought I could convert this into a full “proper” book, and that I could become a published author.

And so I started writing, mostly in cafes where I went to for breakfast (croissant and “cortado”) and for coffees. I set myself ambitious targets, of the nature of writing at least two thousand words in each session. This might help me get out a skeleton of the book by the time my vacation ended, I reasoned.

Midway through my vacation, I decided to review my work before proceeding, and found my own writing unreadable. This is not always the case – for example, I quite enjoy going back and reading my own old blog posts. I’m quite narcissistic, in other words, when it comes to my own writing. And I found my own work-in-progress book unreadable! I immediately put a pause on it, and proceeded to fritter away the rest of my vacation in an offhand way.

I got back to Bangalore and sent the “manuscript”, if it can be called such to editor extraordinaire Sarah Farooqui, I don’t know what trouble she went through reading it, but her reaction was rather crisp – that the “book” was anything but crisp and I should cut down on the multitude of words, sentences and paragraphs that added no value. The project remains stillborn.

So based on these two data points, one from a great novelist (none of whose novels I’ve read), and one from my not-so-humble self, I posit that a Kindle single once conceived should be left that way, and authors should not be overcome by delusions of grandeur that might lead them to believe they are in the process of writing a great work. The only thing that can come out of this is a horribly overblown book whose information content is no greater than that of the Kindle single originally conceived.

Long ago on this blog I had written about “blog posts turned into books”, after reading Richard MacKenzie’s book on pricing (Why popcorn costs so much at the movies). The same holds true for Kindle singles turned into books, too. And when I started writing I intended to be a 500-word blog post, not the 900-word monster it has turned into. I wouldn’t blame you if you if you didn’t get this far.

Tag: code

What is the Case Fatality Rate of Covid-19 in India?

Books and Kindle Singles