Tests per positive case

I seem to be becoming a sort of “testing expert”, though the so-called “testing mafia” (ok I only called them that) may disagree. Nothing external happened since the last time I wrote about this topic, but here is more “expertise” from my end.

As some of you might be aware, I’ve now created a script that does the daily updates that I’ve been doing on Twitter for the last few weeks. After I went off twitter last week, I tried for a couple of days to get friends to tweet my graphs. That wasn’t efficient. And I’m not yet over the twitter addiction enough to log in to twitter every day to post my daily updates.

So I’ve done what anyone who has a degree in computer science, and who has a reasonable degree of self-respect, should do – I now have this script (that runs on my server) that generates the graph and some mildly “intelligent” commentary and puts it out at 8am everyday. Today’s update looked like this:

Sometimes I make the mistake of going to twitter and looking at the replies to these automated tweets (that can be done without logging in). Most replies seem to be from the testing mafia. “All this is fine but we’re not testing enough so can’t trust the data”, they say. And then someone goes off on “tests per million” as if that is some gold standard.

As I discussed in my last post on this topic, random testing is NOT a good thing here. There are several ethical issues with that. The error rates with the testing means that there is a high chance of false positives, and also false negatives. So random testing can both “unleash” infected people, and unnecessarily clog hospital capacity with uninfected.

So if random testing is not a good metric on how adequately we are testing, what is? One idea comes from this Yahoo report on covid management in Vietnam.

According to data published by Vietnam’s health ministry on Wednesday, Vietnam has carried out 180,067 tests and detected just 268 cases, 83% of whom it says have recovered. There have been no reported deaths.

The figures are equivalent to nearly 672 tests for every one detected case, according to the Our World in Data website. The next highest, Taiwan, has conducted 132.1 tests for every case, the data showed

Total tests per positive case. Now, that’s an interesting metric. The basic idea is that if most of the people we are testing show positive, then we simply aren’t testing enough. However, if we are testing a lot of people for every positive case, then it means that we are also testing a large number of marginal cases (there is one caveat I’ll come to).

Also, tests per positive case also takes the “base rate” into effect. If a region has been affected massively, then the base rate itself will be high, and the region needs to test more. A less affected region needs less testing (remember we only  test those with a high base rate). And it is likely that in a region with a higher base rate, more positive cases are found (this is a deadly disease. So anyone with more than a mild occurrence of the disease is bound to get themselves tested).

The only caveat here is that the tests need to be “of high quality”, i.e. they should be done on people with high base rates of having the disease. Any measure that becomes a metric is bound to be gamed, so if tests per positive case becomes a metric, it is easy for a region to game that by testing random people (rather than those with high base rates). For now, let’s assume that nobody has made this a “measure” yet, so there isn’t that much gaming yet.

So how is India faring? Based on data from covid19india.org, until yesterday India had done (as of yesterday, 23rd April) about 520,000 tests, of which about 23,000 people have tested positive. In other words, India has tested 23 people for every positive test. Compared to Vietnam (or even Taiwan) that’s a really low number.

However, different states are testing to different extents by this metric. Again using data from covid19india.org, I created this chart that shows the cumulative “tests per positive case” in each state in India. I drew each state in a separate graph, with different scales, because they were simply not comparable.

Notice that Maharashtra, our worst affected state is only testing 14 people for every positive case, and this number is going down over time. Testing capacity in that state (which has, on an absolute number, done the maximum number of tests) is sorely stretched, and it is imperative that testing be scaled up massively there. It seems highly likely that testing has been backlogged there with not enough capacity to test the high base rate cases. Gujarat and Delhi, other badly affected states, are also in similar boats, testing only 16 and 13 people (respectively) for every infected person.

At the other end, Orissa is doing well, testing 230 people for every positive case (this number is rising). Karnataka is not bad either, with about 70 tests per case  (again increasing. The state massively stepped up on testing last Thursday). Andhra Pradesh is doing nearly 60. Haryana is doing 65.

Now I’m waiting for the usual suspects to reply to this (either on twitter, or as a comment on my blog) saying this doesn’t matter we are “not doing enough tests per million”.

I wonder why some people are proud to show off their innumeracy (OK fine, I understand that it’s a bit harsh to describe someone who doesn’t understand Bayes’s Theorem as “innumerate”).

 

Zoom in, zoom out

It was early on in the lockdown that the daughter participated in her first ever Zoom videoconference. It was an extended family call, with some 25 people across 9 or 10 households.

It was chaotic, to say the least. Family call meant there was no “moderation” of the sort you see in work calls (“mute yourself unless you’re speaking”, etc.). Each location had an entire family, so apart from talking on the call (which was chaotic with so many people anyways), people started talking among themselves. And that made it all the more chaotic.

Soon the daughter was shouting that it was getting too loud, and turned my computer volume down to the minimum (she’s figured out most of my computer controls in the last 2 months). After that, she lost interest and ran away.

A couple of weeks later, the wife was on a zoom call with a big group of her friends, and asked the daughter if she wanted to join. “I hate zoom, it’s too loud”, the daughter exclaimed and ran away.

Since then she has taken part in a couple of zoom calls, organised by her school. She sat with me once when I chatted with a (not very large) group of school friends. But I don’t think she particularly enjoys Zoom, or large video calls. And you need to remember that she is a “video call native“.

The early days of the lockdown were ripe times for people to turn into gurus, and make predictions with the hope that nobody would ever remember them in case they didn’t come through (I indulged in some of this as well). One that made the rounds was that group video calling would become much more popular and even replace group meetings (especially in the immediate aftermath of the pandemic).

I’m not so sure. While the rise of video calling has indeed given me an excuse to catch up “visually” with friends I haven’t seen in ages, I don’t see that much value from group video calls, after having participated in a few. The main problem is that there can, at a time, be only one channel of communication.

A few years back I’d written about the “anti two pizza rule” for organising parties, where I said that if you have a party, you should either have five or fewer guests, or ten or more (or something of the sort). The idea was that five or fewer can indeed have one coherent conversation without anyone being left out. Ten or more means the group naturally splits into multiple smaller groups, with each smaller group able to have conversations that add value to them.

In between (6-9 people) means it gets awkward – the group is too small to split, and too large to have one coherent conversation, and that makes for a bad party.

Now take that online. Because we have only one audio channel, there can only be one conversation for the entire group. This means that for a group of 10 or above, any “cross talk” needs to be necessarily broadcast, and that interferes with the main conversation of the group. So however large the group size of the online conversation, you can’t split the group. And the anti two pizza rule becomes “anti greater than or equal to two pizza rule”.

In other words, for an effective online conversation, you need to have four (or at max five) participants. Else you can risk the group getting unwieldy, some participants feeling left out or bored, or so much cross talk that nobody gets anything out of it.

So Zoom (or any other video chat app) is not going to replace any of our regular in-person communication media. It might to a small extent in the immediate wake of the pandemic, when people are afraid to meet large groups, but it will die out after that. OK, that is one more prediction from my side.

In related news, I swore off lecturing in Webinars some five years ago. Found it really stressful to lecture without the ability to look into the eyes of the “students”. I wonder if teachers worldwide who are being forced to lecture online because of the shut schools feel the way I do.

More on covid testing

There has been a massive jump in the number of covid-19 positive cases in Karnataka over the last couple of days. Today, there were 44 new cases discovered, and yesterday there were 36. This is a big jump from the average of about 15 cases per day in the preceding 4-5 days.

The good news is that not all of this is new infection. A lot of cases that have come out today are clusters of people who have collectively tested positive. However, there is one bit from yesterday’s cases (again a bunch of clusters) that stands out.

Source: covid19india.org

I guess by now everyone knows what “travelled from Delhi” is a euphemism for. The reason they are interesting to me is that they are based on a “repeat test”. In other words, all these people had tested negative the first time they were tested, and then they were tested again yesterday and found positive.

Why did they need a repeat test? That’s because the sensitivity of the Covid-19 test is rather low. Out of every 100 infected people who take the test, only about 70 are found positive (on average) by the test. That also depends upon when the sample is taken.  From the abstract of this paper:

Over the four days of infection prior to the typical time of symptom onset (day 5) the probability of a false negative test in an infected individual falls from 100% on day one (95% CI 69-100%) to 61% on day four (95% CI 18-98%), though there is considerable uncertainty in these numbers. On the day of symptom onset, the median false negative rate was 39% (95% CI 16-77%). This decreased to 26% (95% CI 18-34%) on day 8 (3 days after symptom onset), then began to rise again, from 27% (95% CI 20-34%) on day 9 to 61% (95% CI 54-67%) on day 21.

About one in three (depending upon when you draw the sample) infected people who have the disease are found by the test to be uninfected. Maybe I should state it again. If you test a covid-19 positive person for covid-19, there is almost a one-third chance that she will be found negative.

The good news (at the face of it) is that the test has “high specificity” of about 97-98% (this is from conversations I’ve had with people in the know. I’m unable to find links to corroborate this), or a false positive rate of 2-3%. That seems rather accurate, except that when the “prior probability” of having the disease is low, even this specificity is not good enough.

Let’s assume that a million Indians are covid-19 positive (the official numbers as of today are a little more than one-hundredth of that number). With one and a third billion people, that represents 0.075% of the population.

Let’s say we were to start “random testing” (as a number of commentators are advocating), and were to pull a random person off the street to test for Covid-19. The “prior” (before testing) likelihood she has Covid-19 is 0.075% (assume we don’t know anything more about her to change this assumption).

If we were to take 20000 such people, 15 of them will have the disease. The other 19985 don’t. Let’s test all 20000 of them.

Of the 15 who have the disease, the test returns “positive” for 10.5 (70% accuracy, round up to 11). Of the 19985 who don’t have the disease, the test returns “positive” for 400 of them (let’s assume a specificity of 98% (or a false positive rate of 2%), placing more faith in the test)! In other words, if there were a million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 11/411 = 2.6%.

If there were 10 million covid-19 positive people in India (no harm in supposing), then the “base rate” would be .75%. So out of our sample of 20000, 150 would have the disease. Again testing all 20000, 105 of the 150 who have the disease would test positive. 397 of the 19850 who don’t have the disease will test positive. In other words, if there were ten million Covid-19 positive people in India, and a random Indian were to take the test and test positive, the likelihood she actually has the disease is 105/(397+105) = 21%.

If there were ten million Covid-19 positive people in India, only one-fifth of the people who tested positive in a random test would actually have the disease.

Take a sip of water (ok I’m reading The Ken’s Beyond The First Order too much nowadays, it seems).

This is all standard maths stuff, and any self-respecting book or course on probability and Bayes’s Theorem will have at least a reference to AIDS or cancer testing. The story goes that this was a big deal in the 1990s when some people suggested that the AIDS test be used widely. Then, once this problem of false positives and posterior probabilities was pointed out, the strategy of only testing “high risk cases” got accepted.

And with a “low incidence” disease like covid-19, effective testing means you test people with a high prior probability. In India, that has meant testing people who travelled abroad, people who have come in contact with other known infected, healthcare workers, people who attended the Tablighi Jamaat conference in Delhi, and so on.

The advantage with testing people who already have a reasonable chance of having the disease is that once the test returns positive, you can be pretty sure they actually have the disease. It is more effective and efficient. Testing people with a “high prior probability of disease” is not discriminatory, or a “sampling bias” as some commentators alleged. It is prudent statistical practice.

Again, as I found to my own detriment with my tweetstorm on this topic the other day, people are bound to see politics and ascribe political motives to everything nowadays. In that sense, a lot of the commentary is not surprising. It’s also not surprising that when “one wing” heavily retweeted my article, “the other wing” made efforts to find holes in my argument (which, again, is textbook math).

One possibly apolitical criticism of my tweetstorm was that “the purpose of random testing is not to find out who is positive. It is to find out what proportion of the population has the disease”. The cost of this (apart from the monetary cost of actually testing) are threefold. Firstly, a large number of uninfected people will get hospitalised in covid-specific hospitals, clogging hospital capacity and increasing the chances that they get infected while in hospital.

Secondly, getting a truly random sample in this case is tricky, and possibly unethical. When you have limited testing capacity, you would be inclined (possibly morally, even) to use it on people who already have a high prior probability.

Finally, when the incidence is small, we need a really large sample to find out the true range.

Let’s say 1 in 1000 Indians have the disease (or about 1.35 million people). Using the Chi Square test of proportions, our estimate of the incidence of the disease varies significantly on how many people are tested.

If we test a 1000 people and find 1 positive, the true incidence of the disease (95% confidence interval) could be anywhere from 0.01% to 0.65%.

If we test 10000 people and find 10 positive, the true incidence of the disease could be anywhere between 0.05% and 0.2%.

Only if we test 100000 people (a truly massive random sample) and find 100 positive, then the true incidence lies between 0.08% and 0.12%, an acceptable range.

I admit that we may not be testing enough. A simple rule of thumb is that anyone with more than a 5% prior probability of having the disease needs to be tested. How we determine this prior probability is again dependent on some rules of thumb.

I’ll close by saying that we should NOT be doing random testing. That would be unethical on multiple counts.

Behavioural colour schemes

One of the seminal results of behavioural economics (a field I’m having less and less faith in as the days go by, especially once I learnt about ergodicity) is that by adding a choice to an existing list of choices, you can change people’s preferences.

For example, if you give people a choice between vanilla ice cream for ?70 and vanilla ice cream with chocolate sauce for ?110, most people will go for just the vanilla ice cream. However, when you add a third option, let’s say “vanilla ice cream with double chocolate sauce” for ?150, you will see more people choosing the vanilla ice cream with chocolate sauce (?110) over the plain vanilla ice cream (?70).

That example I pulled out of thin air, but trust me, this is the kind of examples you see in behavioural economics literature. In fact, a lot of behavioural economics research is about getting 24 undergrads to participate in an experiment (which undergrad doesn’t love free ice cream?) and giving them options like above. Then based on how their preferences change when the new option is added, a theory is concocted on how people choose.

The existence of “green jelly beans” (or p-value hunting, also called “p-hacking”) cannot be ruled out in such studies.

Anyway, enough bitching about behavioural economics, because while their methods may not be rigorous, and can sometimes be explained using conventional economics, some of their insights do sometimes apply in real life. Like the one where you add a choice and people start seeing the existing choices in a different way.

The other day, Nitin Pai asked me to product a district-wise map of Karnataka colour coded by the prevalence of Covid-19 (or the “Wuhan virus”) in each district. “We can colour them green, yellow, orange and red”, he said, “based on how quickly cases are growing in each district”.

After a few backs and forths, and using data from the excellent covid19india.org  , we agreed on a formula for how to classify districts by colour. And then I started drawing maps (R now has superb methods to draw maps using ggplot2).

For the first version, I took his colour recommendations at face value, and this is what came out. 

While the data is shown easily, there are two problems with this chart. Firstly, as my father might have put it, “the colours hit the eyes”. There are too many bright colours here and it’s hard to stare at the graph for too long. Secondly, the yellow and the orange appear a bit too similar. Not good.

So I started playing around. As a first step, I replaced “green” with “darkgreen”. I think I got lucky. This is what I got. 

Just this one change (OK i made one more change – made the borders black, so that the borders between contiguous dark green districts can be seen more clearly) made so much of a difference.

Firstly, the addition of the sober dark green (rather the bright green) means that the graph looks so much better on the eye now. The same yellow and orange and red don’t “hit the eyes” like they used to in green’s company.

And more importantly (like the behavioural economics theory), the orange and yellow look much more distinct from each other now (my apologies to readers who are colour blind). Rather than trying to change the clashing colours (the other day I’d tried changing yellow to other closer colours but nothing had worked), adding a darker shade alongside meant that the distinctions became much more visible.

Maybe there IS something to behavioural economics, at least when it comes to colour schemes.

A trip to the supermarket

Normally even I wouldn’t write about a trip to a supermarket, but these aren’t normal times. With the shutdown scheduled to go on for another two weeks, and with some “essential commodities” emptying, I decided to go stock up.

I might just have postponed my trip by a few more days, but then I saw tweets by the top cop of Bangalore saying they’re starting to seize personal vehicles out on the road during the lockdown. I needed to get some heavy stuff (rice, lentils, oils, etc.) so decided to brave it with the car.

Having taken stock of inventory and made a longlist of things we need, I drove out using “back roads” to the very nearby Simpli Namdhari store. While I expected lines at the large-format store, I expected that it would be compensated for by the variety of stuff I could find there.

I got there at 230 only to be told the store was “closed for lunch” and it would reopen at 3. “All counters are open”, the security guard told me. I saw inside that the store was being cleaned. Since it’s a 3 minute drive away, I headed back home and reached there at 3:15.

There was a small line (10-15 people long) when I got there. I must mention I was super impressed by the store at the outset. Lines had been drawn outside to ensure queueing at a safe distance. Deeper in the queue, chairs had been placed (again at a safe distance from each other) to queue in comfort. They were letting in people about 10 at a time, waiting for an equal number to exit the store each time.

It was around 335 by the time I got in (20 minute wait). From the entrance most shelves seemed full.

The thing with Namdhari’s is that they control the supplies of a large number of things they sell (fruits, vegetables, dairy, bread, etc.), and all of them were well stocked. In times like this (I can’t believe I’m using this phrase!), some sort of vertical integration helps, since you can produce the stuff because you know the downstream demand.

(in any case, for things like vegetables and milk, where there is a large gap between “sowing” and “reaping”, production hasn’t fallen at all. It’s a massive supply chain problem and plenty of stuff is getting wasted while people don’t have enough. Stuff like bread is where vertical integration helps)

In any case I took two trips round the supermarket with my trolley, checking items off my checklist as I put items into the trolley (unusual times mean even disorganised people like me make checklists). Again the vertical integration showed.

Stuff that Namdhari’s owns upstream of, like staples and oils, were well stocked. High demand stuff for which Namdhari’s is only a reseller, like Maggi or crisps or biscuits were poorly stocked. Interestingly, “exotic stuff” (like peanut butter or cheeses, around which Namdhari’s has partly built its reputation) was reasonably well stocked, for which I was really thankful (we consume far more of these than the average Indian household).

How much to buy was a dilemma I had in my head through the shopping trip. For one, there was the instinct to hoard, since I was clear I didn’t want another shopping trip like this until the shutdown ends (milk, vegetables and eggs are reasonably easily available close to home, but I wasn’t there for that).

On the other hand, I was “mindful” of “fair usage policy”, to not take more than what I needed, since you didn’t want stockouts if you could help it.

The other thing that shortages do to you is that you buy stuff you don’t normally buy. Like the other day at another shop I’d bought rice bran oil because groundnut oil wasn’t available. While you might buy something as “backup”, you are cognisant that if you get through the lockdown without needing this backup, this backup will never get used.

So even though we’re running short of sambar powder, I ignored it since the only sambar powder on offer looked pretty sad. On the other hand, I bought Haldiram’s Mixture since no “local mixtures” are available nowadays, and mixture is something I love having with my curd rice.

I was a little more “liberal” with stuff that I know won’t go bad such as dry fruits or staples, but then again that’s standard inventory management – you are willing to hold higher inventories of  items with longer shelf life.

I might have taken a bit longer there to make sure I’d got everything on my list, but then my “mask” made out of a hanky and two rubberbands had started to hurt. So, with half my list unfulfilled, I left.

Even at the checkout line, people stood a metre away from each other. You had to bag your own groceries, which isn’t a standard thing in India, but enforced now since you don’t want too many hands touching your stuff.

Oh, and plenty of people had come by car to the store. There were cops around, but they didn’t bother anyone.

Simulating Covid-19 Scenarios

I must warn that this is a super long post. Also I wonder if I should put this on medium in order to get more footage.

Most models of disease spread use what is known as a “SIR” framework. This Numberphile video gives a good primer into this framework.

The problem with the framework is that it’s too simplistic. It depends primarily on one parameter “R0”, which is the average number of people that each infected patient infects. When R0 is high, each patient infects a number of other people, and the disease spreads fast. With a low R0, the disease spreads slow. It was the SIR model that was used to produce all those “flatten the curve” pictures that we were bombarded with a week or two back.

There is a second parameter as well – the recovery or removal rate. Some diseases are so lethal that they have a high removal rate (eg. Ebola), and this puts a natural limit on how much the disease can spread, since infected people die before they can infect too many people.

In any case, such modelling is great for academic studies, and post-facto analyses where R0 can be estimated. As we are currently in the middle of an epidemic, this kind of simplistic modelling can’t take us far. Nobody has a clue yet on what the R0 for covid-19 is. Nobody knows what proportion of total cases are asymptomatic. Nobody knows the mortality rate.

And things are changing well-at-a-faster-rate. Governments are imposing distancing of various forms. First offices were shut down. Then shops were shut down. Now everything is shut down, and many of us have been asked to step out “only to get necessities”. And in such dynamic and fast-changing environments, a simplistic model such as the SIR can only take us so far, and uncertainty in estimating R0 means it can be pretty much useless as well.

In this context, I thought I’ll simulate a few real-life situations, and try to model the spread of the disease in these situations. This can give us an insight into what kind of services are more dangerous than others, and how we could potentially “get back to life” after going through an initial period of lockdown.

The basic assumption I’ve made is that the longer you spend with an infected person, the greater the chance of getting infected yourself. This is not an unreasonable assumption because the spread happens through activities such as sneezing, touching, inadvertently dropping droplets of your saliva on to the other person, and so on, each of which is more likely the longer the time you spend with someone.

Some basic modelling revealed that this can be modelled as a sort of negative exponential curve that looks like this.

p = 1 - e^{-\lambda T}

T is the number of hours you spend with the other person. \lambda is a parameter of transmission – the higher it is, the more likely the disease with transmit (holding the amount of time spent together constant).

The function looks like this: 

We have no clue what \lambda is, but I’ll make an educated guess based on some limited data I’ve seen. I’ll take a conservative estimate and say that if an uninfected person spends 24 hours with an infected person, the former has a 50% chance of getting the disease from the latter.

This gives the value of \lambda to be 0.02888 per hour. We will now use this to model various scenarios.

  1. Delivery

This is the simplest model I built. There is one shop, and N customers.  Customers come one at a time and spend a fixed amount of time (1 or 2 or 5 minutes) at the shop, which has one shopkeeper. Initially, a proportion p of the population is infected, and we assume that the shopkeeper is uninfected.

And then we model the transmission – based on our \lambda = 0.02888, for a two minute interaction, the probability of transmission is 1 - e^{-\lambda T} = 1 - e^{-\frac{0.02888 * 2}{60}} ~= 0.1%.

In hindsight, I realised that this kind of a set up better describes “delivery” than a shop. With a 0.1% probability the delivery person gets infected from an infected customer during a delivery. With the same probability an infected delivery person infects a customer. The only way the disease can spread through this “shop” is for the shopkeeper / delivery person to be uninfected.

How does it play out? I simulated 10000 paths where one guy delivers to 1000 homes (maybe over the course of a week? that doesn’t matter as long as the overall infected rate in the population otherwise is constant), and spends exactly two minutes at each delivery, which is made to a single person. Let’s take a few cases, with different base cases of incidence of the disease – 0.1%, 0.2%, 0.5%, 1%, 2%, 5%, 10%, 20% and 50%.

The number of NEW people infected in each case is graphed here (we don’t care how many got the disease otherwise. We’re modelling how many got it from our “shop”). The  right side graph excludes the case of zero new infections, just to show you the scale of the problem.

Notice this – even when 50% of the population is infected, as long as the shopkeeper or delivery person is not initially infected, the chances of additional infections through 2-minute delivery are MINUSCULE. A strong case for policy-makers to enable delivery of all kinds, essential or inessential.

2. SHOP

Now, let’s complicate matters a little bit. Instead of a delivery person going to each home, let’s assume a shop. Multiple people can be in the shop at the same time, and there can be more than one shopkeeper.

Let’s use the assumptions of standard queueing theory, and assume that the inter-arrival time for customers is guided by an Exponential distribution, and the time they spend in the shop is also guided by an Exponential distribution.

At the time when customers are in the shop, any infected customer (or shopkeeper) inside can infect any other customer or shopkeeper. So if you spend 2 minutes in a shop where there is 1 infected person, our calculation above tells us that you have a 0.1% chance of being infected yourself. If there are 10 infected people in the shop and you spend 2 minutes there, this is akin to spending 20 minutes with one infected person, and you have a 1% chance of getting infected.

Let’s consider two or three scenarios here. First is the “normal” case where one customer arrives every 5 minutes, and each customer spends 10 minutes in the shop (note that the shop can “serve” multiple customers simultaneously, so the queue doesn’t blow up here). Again let’s take a total of 1000 customers (assume a 24/7 open shop), and one shopkeeper.

 

Notice that there is significant transmission of infection here, even though we started with 5% of the population being infected. On average, another 3% of the population gets infected! Open supermarkets with usual crowd can result in significant transmission.

Does keeping the shop open with some sort of social distancing (let’s see only one-fourth as many people arrive) work? So people arrive with an average gap of 20 minutes, and still spend 10 minutes in the shop. There are still 10 shopkeepers. What does it look like when we start with 5% of the people being infected?

The graph is pretty much identical so I’m not bothering to put that here!

3. Office

This scenario simulates for N people who are working together for a certain number of hours. We assume that exactly one person is infected at the beginning of the meeting. We also assume that once a person is infected, she can start infecting others in the very next minute (with our transmission probability).

How does the infection grow in this case? This is an easier simulation than the earlier one so we can run 10000 Monte Carlo paths. Let’s say we have a “meeting” with 40 people (could just be 40 people working in a small room) which lasts 4 hours. If we start with one infected person, this is how the number of infected grows over the 4 hours.

 

 

 

The spread is massive! When you have a large bunch of people in a small closed space over a significant period of time, the infection spreads rapidly among them. Even if you take a 10 person meeting over an hour, one infected person at the start can result in an average of 0.3 other people being infected by the end of the meeting.

10 persons meeting over 8 hours (a small office) with one initially infected means 3.5 others (on average) being infected by the end of the day.

Offices are dangerous places for the infection to spread. Even after the lockdown is lifted, some sort of work from home regulations need to be in place until the infection has been fully brought under control.

4. Conferences

This is another form of “meeting”, except that at each point in time, people don’t engage with the whole room, but only a handful of others. These groups form at random, changing every minute, and infection can spread only within a particular group.

Let’s take a 100 person conference with 1 initially infected person. Let’s assume it lasts 8 hours. Depending upon how many people come together at a time, the spread of the infection rapidly changes, as can be seen in the graph below.

If people talk two at a time, there’s a 63% probability that the infection doesn’t spread at all. If they talk 5 at a time, this probability is cut by half. And if people congregate 10 at a time, there’s only a 11% chance that by the end of the day the infection HASN’T propagated!

One takeaway from this is that even once offices start functioning, they need to impose social distancing measures (until the virus has been completely wiped out). All large-ish meetings by video conference. A certain proportion of workers working from home by rotation.

And I wonder what will happen to the conferences.

I’ve put my (unedited) code here. Feel free to use and play around.

Finally, you might wonder why I’ve made so many Monte Carlo Simulations. Well, as the great Matt Levine had himself said, that’s my secret sauce!

 

Why Border Control Is Necessary

India is shutting down its domestic flights today in order to stop the spread of Covid-19. This comes a day after shutting down the national railways and most inter-city buses. States and districts have imposed border controls to control the movement of people across borders.

The immediate reaction to this would be that this is a regressive step. After a few decades of higher integration (national and international) this drawing of borders at minute levels might seem retrogade. Moreover, the right of a citizen to move anywhere in India is a fundamental right, and so this closing of borders might seem like a violation of fundamental rights as well.

However, the nature of the Covid-19 bug is that such measures are not only permissible but also necessary. The evidence so far is that it has a high rate of transmission between people who meet each other – far higher than for any other flu. The mortality rate due to the illness the bug causes is also low enough that each sick person has the opportunity to infect a large number of others before recovery or death (compared to this, diseases such as Ebola had a much higher death rate, which limited its transmission).

So far no cure for Covid-19 has been found. Instead, the most optimal strategy has been found to prevent infected people from meeting uninfected people. And since it is hard to know who is infected yet (since it takes time for symptoms to develop), the strategy is to prevent people from meeting each other. In fact, places like Wuhan, where the disease originated, managed to stem the disease by completely shutting down the city (it’s about the size of Bangalore).

In this context, open borders (at whatever level) can present a huge threat to Covid-19 containment. You might manage to completely stem the spread of the disease in a particular region, only to see it reappear with a vengeance thanks to a handful of people who came in (Singapore and HongKong have witnessed exactly this).

For this reason, the first step for a region to try and get free of the virus is to “stop importing” it. The second step is to shut down the region itself so that the already infected don’t meet the uninfected and transmit the disease to them.

Also, a complete shutdown can be harmful to the economy, which has already taken a massive battering from the disease. So for this reason, the shutdown is best done at as small a level as possible, so that the overall disruption is minimised. Also different regions might need different levels of shutdown in order to contain the disease. For all these reasons, the handling of the virus is best done at as local a level as possible. City/town better than district better than state better than country.

And once the spread of the disease has been stopped in a region, we should be careful that we don’t import it after that, else all the good work gets undone. For this reason, the border controls need to remain for a while longer until transmission has stopped in neighbouring (and other) regions.

It’s a rather complex process, but the main points to be noted are that the containment has to happen at a local level, and once it has been contained, we need to be careful to not import it. And for both these to happen, it is necessary that borders be shut down.