Covid-19 superspreaders in Karnataka

Through a combination of luck and competence, my home state of Karnataka has handled the Covid-19 crisis rather well. While the total number of cases detected in the state edged past 2000 recently, the number of locally transmitted cases detected each day has hovered in the 20-25 range.

Perhaps the low case volume means that Karnataka is able to give out data at a level that few others states in India are providing. For each case, the rationale behind why the patient was tested (which is usually the source where they caught the disease) is given. This data comes out in two daily updates through the @dhfwka twitter handle.

There was this research that came out recently that showed that the spread of covid-19 follows a classic power law, with a low value of “alpha”. Basically, most infected people don’t infect anyone else. But there are a handful of infected people who infect lots of others.

The Karnataka data, put out by @dhfwka  and meticulously collected and organised by the folks at (they frequently drive me mad by suddenly changing the API or moving data into a new file, but overall they’ve been doing stellar work), has sufficient information to see if this sort of power law holds.

For every patient who was tested thanks to being a contact of an already infected patient, the “notes” field of the data contains the latter patient’s ID. This way, we are able to build a sort of graph on who got the disease from whom (some people got the disease “from a containment zone”, or out of state, and they are all ignored in this analysis).

From this graph, we can approximate how many people each infected person transmitted the infection to. Here are the “top” people in Karnataka who transmitted the disease to most people.

Patient 653, a 34 year-old male from Karnataka, who got infected from patient 420, passed on the disease to 45 others. Patient 419 passed it on to 34 others. And so on.

Overall in Karnataka, based on the data from as of tonight, there have been 732 cases where a the source (person) of infection has been clearly identified. These 732 cases have been transmitted by 205 people. Just two of the 205 (less than 1%) are responsible for 79 people (11% of all cases where transmitter has been identified) getting infected.

The top 10 “spreaders” in Karnataka are responsible for infecting 260 people, or 36% of all cases where transmission is known. The top 20 spreaders in the state (10% of all spreaders) are responsible for 48% of all cases. The top 41 spreaders (20% of all spreaders) are responsible for 61% of all transmitted cases.

Now you might think this is not as steep as the “well-known” Pareto distribution (80-20 distribution), except that here we are only considering 20% of all “spreaders”. Our analysis ignores the 1000 odd people who were found to have the disease at least one week ago, and none of whose contacts have been found to have the disease.

I admit this graph is a little difficult to understand, but basically I’ve ordered people found for covid-19 in Karnataka by number of people they’ve passed on the infection to, and graphed how many people cumulatively they’ve infected. It is a very clear pareto curve.

The exact exponent of the power law depends on what you take as the denominator (number of people who could have infected others, having themselves been infected), but the shape of the curve is not in question.

Essentially the Karnataka validates some research that’s recently come out – most of the disease spread stems from a handful of super spreaders. A very large proportion of people who are infected don’t pass it on to any of their contacts.