Surveying Income

For a long time now, I’ve been sceptical of the practice of finding out the average income in a country or state or city or locality by doing a random survey. The argument I’ve made is “whether you keep Mukesh Ambani in the sample or not makes a huge difference in your estimate”. So far, though, I hadn’t been able to make a proper mathematical argument.

In the course of writing a piece for Bloomberg Quint (my first for that publication), I figured out a precise mathematical argument. Basically, incomes are distributed according to a power law distribution, and the exponent of the power law means that variance is not defined. And hence the Central Limit Theorem isn’t applicable.

OK let me explain that in English. The reason sample surveys work is due to a result known as the Central Limit Theorem. This states that for a distribution with finite mean and variance, the average of a random sample of data points is not very far from the average of the population, and the difference follows a normal distribution with zero mean and variance that is inversely proportional to the number of points surveyed.

So if you want to find out the average height of the population of adults in an area, you can simply take a random sample, find out their heights and you can estimate the distribution of the average height of people in that area. It is similar with voting intention – as long as the sample of people you survey is random (and without bias), the average of their voting intention can tell you with high confidence the voting intention of the population.

This, however, doesn’t work for income. Based on data from the Indian Income Tax department, I could confirm (what theory states) that income in India follows a power law distribution. As I wrote in my piece:

The basic feature of a power law distribution is that it is self-similar – where a part of the distribution looks like the entire distribution.

Based on the income tax returns data, the number of taxpayers earning more than Rs 50 lakh is 40 times the number of taxpayers earning over Rs 5 crore.
The ratio of the number of people earning more than Rs 1 crore to the number of people earning over Rs 10 crore is 38.
About 36 times as many people earn more than Rs 5 crore as do people earning more than Rs 50 crore.

In other words, if you increase the income limit by a factor of 10, the number of people who earn over that limit falls by a factor between 35 and 40. This translates to a power law exponent between 1.55 and 1.6 (log 35 to base 10 and log 40 to base 10 respectively).

Now power laws have a quirk – their mean and variance are not always defined. If the exponent of the power law is less than 1, the mean is not defined. If the exponent is less than 2, then the distribution doesn’t have a defined variance. So in this case, with an exponent around 1.6, the distribution of income in India has a well-defined mean but no well-defined variance.

To recall, the central limit theorem states that the population mean follows a normal distribution with the mean centred at the sample mean, and a variance of \frac{\sigma^2}{n} where \sigma is the standard deviation of the underlying distribution. And when the underlying distribution itself is a power law distribution with an exponent less than 2 (as the case is in India), \sigma itself is not defined.

Which means the distribution of population mean around sample mean has infinite variance. Which means the sample mean tells you absolutely nothing!

And hence, surveying is not a good way to find the average income of a population.

Mass marketing and objective journalism

This is a fascinating essay by Antonio García Martinez on the history and future of journalism (possibly paywalled). The money paragraph is this:

The bigger switch happened as a national market for consumer goods opened after the Civil War, when purveyors like department stores wanted to reach large urban audiences. Newspapers responded by increasing the number of ads relative to content, and switched to models that went light on the political partisanship in the interest of expanding circulation. This move was driven not exclusively by lofty ideals but also by mercenary greed. And it worked. Newspapers used to make lots of money. Mountains of money.

Basically, the move to objective journalism came in the late 1800s when advertisers such as Macy’s wanted to take out full page ads, and wanted to do so in newspapers that served the largest sections of the market. And when a newspaper had to reach a large section of the market, it inevitably had to tone down the partisanship, and become more objective.

Over the last decade, we have been witnessing (across the world) the decline of objective media. All media is “#paidmedia” based on which side of the political spectrum you stand on. There aren’t that many truly objective papers around, and social media is bombarded left and right by extremely politicised reporting that goes as “news”.

It is perhaps no coincidence that this period has coincided with a time when print circulation has been dropping steadily (in the developed world at least), and where online advertising can be highly targeted.

In theory, mass marketing is inefficient. When you pay to put up a hoarding somewhere, you’re possibly paying a small amount for each person who sees the hoarding, but not all of them might find it interesting. Consequently, this reflects in a depressed per-person price of the hoarding implying the owner of that real estate can’t make as much as she could if the hoarding were to be more “targeted”.

When you can target your advertisements more precisely, everybody wins. You as the marketer know that your advertisement is only being shown to your intended audience. The owner of the real estate where you put your advertisement can thus charge you more for your advertisement. Even the customer will be less pained by the advertisement if it is highly relevant to her.

Another way of seeing it is – an advertisement shown to a customer who doesn’t want to see it is wasted. The monetary cost of this waste are borne by the owner of the real estate and the advertiser, and the non-monetary cost is borne by the customer (being forced to see something she didn’t want to see). And so one of the biggest technological problems of today is on how we can target advertisements better so that we can minimise such costs – and in the last decade and half, we’ve made significant progress on that front.

The problem with greater efficiency, however, is that it comes with the side-effect of biased media. When Nike knows that it can precisely target an advertisement at American leftwingers, it makes an ad with Colin Kaepernick and shows them to American leftwingers to sell them more shoes.

This doesn’t however, mean that Nike only sells to left-wingers. The same company can make another advertisement targeted precisely at right-wingers and use it to sell shoes to them!

So now that you can make left-wing and right-wing ads, and you have the ability to target them, you want to cut the waste and place the ads so that you can target as best as possible. In other words, you want to place your left-wing ads in places that only left-wingers want to see, and right-wing ads only in places that right-wingers will see. And so you prefer to advertise in CNN and Fox rather than in a hypothetical “broad market” media outlet.

And the reason you created the politically charged ads in the first place was because there were some outlets (Facebook, for example) where you could precisely target people based on their political orientation. And so you see the vicious cycle – that you can target in some places means you want other places where you can target and that creates demand for more polarised media.

It was the opposite cycle that took effect in the late 1800s and early 1900s. There was no way brands could target (also, when you make physical advertisements, with 1900s technology, each advertisement is costly and you don’t want to make one per segment) too effectively, and so they went mass market in their communication.

And this meant advertising in the outlets that could get them the maximum number of eyeballs. When you can’t discriminate between a “right” and a “wrong” eyeball, you pay based on the number of eyeballs. And the way for media organisations to grow then was to cater to everyone. Which meant less less bias and more objectivity and more “features”.

Sadly that cycle is now behind us.

Vlogging!

The first seed was sown in my head by Harish “the Psycho” J, who told me a few months back that nobody reads blogs any more, and I should start making “analytics videos” to increase my reach and hopefully hit a new kind of audience with my work.

While the idea was great, I wasn’t sure for a long time what videos I could make. After all, I’m not the most technical guy around, and I had no patience for making videos on “how to use regression” and stuff like that. I needed a topic that would be both potentially catchy and something where I could add value. So the idea remained an idea.

For the last four or five years, my most common lunchtime activity has been to watch chess videos. I subscribe to the Youtube channels of Daniel King and Agadmator, and most days when I eat lunch alone at home are spent watching their analyses of games. Usually this routine gets disrupted on Fridays when the wife works from home (she positively hates these videos), but one Friday a couple of months back I decided to ignore her anyway and watch the videos (she was in her room working).

She had come out to serve herself to another serving of whatever she had made that day and saw me watching the videos. And suddenly asked me why I couldn’t make such videos as well. She has seen me work over the last seven years to build what I think is a fairly cool cricket visualisation, and said that I should use it to make little videos analysing cricket matches.

And since then my constant “background process” has been to prepare for these videos. Earlier, Stephen Rushe of Cricsheet used to unfailingly upload ball by ball data of all cricket matches as soon as they were done. However, two years back he went into “maintenance mode” and has stopped updating the data. And so I needed a method to get data as well.

Here, I must acknowledge the contributions of Joe Harris of White Ball Analytics, who not only showed me the APIs to get ball by ball data of cricket matches, but also gave very helpful inputs on how to make the visualisation more intuitive, and palatable to the normal cricket fan who hasn’t seen such a thing before. Joe has his own win probability model based on ball by ball data, which I think is possibly superior to mine in a lot of scenarios (my model does badly in high-scoring run chases), though I’ve continued to use my own model.

So finally the data is ready, and I have a much improved visualisation to what I had during the IPL last year, and I’ve created what I think is a nice app using the Shiny package that you can check out for yourself here. This covers all T20 international games, and you can use the app to see the “story of each game”.

And this is where the vlogging comes in – in order to explain how the model works and how to use it, I’ve created a short video. You can watch it here:

While I still have a long way to go in terms of my delivery, you can see that the video has come out rather well. There are no sync issues, and you see my face also in one corner. This was possible due to my school friend Sunil Kowlgi‘s Outklip app. It’s a pretty easy to use Chrome app, and the videos are immediately available on the platform. There is quick YouTube integration as well, for you to upload them.

And this is not a one time effort – going forward I’ll be making videos of limited overs games analysing them using my app, and posting them on my Youtube channel (or maybe I’ll make a new channel for these videos. I’ll keep you updated). I hope to become a regular Vlogger!

So in the meantime, watch the above video. And give my app a spin. Soon I’ll be releasing versions covering One Day Internationals and franchise T20s as well.

 

Volatility and price differentiation

In a rather surreal interview to the rather fantastically named Aurangzeb Naqshbandi and Hindustan Times editor Sukumar Ranganathan, Congress president Rahul Gandhi has made a stunning statement in the context of agricultural markets:

Markets are far more volatile in terms of rapid price differentiation, than they were before.

I find this sentence rather surreal, in that I don’t really know what Gandhi is talking about. As a markets guy and a quant, there is only one way in which I interpret this statement.

It is about how market volatility is calculated. While it might be standard to use standard deviation as a measure of market volatility, quants prefer to use a method called “quadratic variation” (when the market price movement follows a random walk, quadratic variation equals the variance).

To calculate quadratic variation, you take market returns at a succession of very small intervals, square these returns and then sum them up. And thinking about it mathematically, calculating returns at short time intervals is similar to taking the derivative of the price, and you can call it “price differentiation”.

So when Gandhi says “markets are far more volatile in terms of rapid price differentiation”, he is basically quoting the formula for quadratic variation – when the derivative of the price time series goes up, the market volatility increases by definition.

This is what you have, ladies and gentlemen – the president of the principal opposition party in India has quoted the formula that quants use for market volatility in an interview with a popular newspaper! Yet, some people continue to call him “pappu”.

Tigers and Bullwhips

Over three years ago, well before our daughter was born, my wife’s cousin had told us that she likes to watch her daughter’s TV shows because they contained “morals”, which were often useful to her at work. While we never took to the “moral” TV show she mentioned (Daniel Tiger – it is bloody boring), I have begun to notice that there are important management lessons in other popular children’s stories.

So I hereby begin this blog series on what I call the “Kiddie MBA” – basically business lessons from kids’s stories. And we will start with that all-time classic, The Tiger Who Came To Tea, by Judith Kerr. 

The basic premise of this story that remains a classic fifty years after being published is what operations managers call the “bullwhip effect“. Sometimes a business, possibly in trading, can be subject to a sudden demand, which the business will not be able to fulfil given its current inventories.

As a result of this sudden one-time spurt in demand, the business increases its future forecasts of demand, and starts keeping more inventory. This business’s supplier sees this increased demand and increases its own forecasts upward, and increases its own inventory. Thus, this one-time demand “shock” percolates up the supply chain, giving the illusion of higher demand and with each layer in the chain keeping higher and higher inventory.

And then one day the retailer will realise that this demand shock is not replicable and moves forecasts downwards, and this triggers a downward edge in the forecasts up the value chain, and demand at the source comes crashing down.

Being a children’s book, The Tiger Who Came To Tea eschews the complexity of the supply chain and instead keeps the story at one level – at the level of the household of the protagonist Sophie (not to be confused with Sophie the Giraffe).

The premise of the story is the demand shock for supplies in Sophie’s home – a tiger comes home for tea and eats up everything that’s at home, drinks up all that’s there to be drunk (including “all the water in the tap”) and leaves, leaving nothing for Sophie and her family.

Assuming that the tiger will return the next day, Sophie’s family stocks up heavily, including “lots of tiger food”. And the tiger never arrives.

My guess is that the rest of the supply chain is left as an exercise to the reader – how the retailer who sold Sophie the tiger food will react to the suddenly higher demand for food (and for tiger food), how this retailer’s supplier will react, whether the tiger visits some other household for tea the next day (making this demand “regular” at the retailer’s level), and so forth.

Perhaps this is what makes this such as great book, and an all-time classic!

Government and markets

It’s been a while since I wrote a post like this one – I remember a decade ago, I used to flood my blog with such stuff.

In any case, last week, in response to the “10yearchallenge” meme, Nitin Pai of Takshashila wrote an Op-Ed in the Print on how India has changed in 10 years. While he admits that the country has grown and the lives of people has improved in some ways, the article leads with the headline that India should be be ashamed of what has happened in the last 10 years. This paragraph is possibly representative of the article:

While individual Indians seem to have done well over the past decade, India is more or less where it was. Worse, politics and policy priorities seem to have regressed to 1989.

Reading through the article (I encourage you to read it, it’s good – never mind the headline), I found a clear and distinct pattern in the kind of things where things have gotten better in India and where things have gotten worse.

Everything where markets function, or where the government doesn’t have much of a role, things have changed significantly for the better. Everything where the government has an outsized role, either because it is the government’s job or the sector is overregulated, things have gotten worse. So our cities have gotten more crowded. Infrastructure has gotten worse. Law and order has regressed. And this has had little to do with the party in power – whatever the government touched has regressed.

Looking at it in another way, Indians seem to be highly capable of making their lives better by coordinating using the invisible hand of the market. However, we seem incapable of making our lives better by coordinating using the government process.

From this perspective, there is one easy way to progress – basically reduce the government. Get rid of the overregulations. Get the government out of things where it shouldn’t be. Give a freer hand to the market.

Unfortunately, ahead of general elections this year, we see most parties taking a highly statist line. This is a real tragedy.

Back to methylphenidate

I can’t remember the last time I was unable to fall asleep. I mean I’ve lost sleep on several days in the last month or two, but on all occasions it’s been after I’d gotten woken up in the middle of my sleep. Today is different – it’s nearly 1 am, and I’ve been in bed for two hours tossing and turning, and completely unable to fall asleep.

I think I left it until it was a bit too late today to restart my methylphenidate, after a three year gap. The dosage is half of what I was used to in 2012-13 and 2015-16. Just 5 milligrams to be taken twice a day. This convinced me that it would be okay to take it in the afternoon. Big mistake. I’ve been completely unable to switch off this evening.

The good thing is that this afternoon ever since I took the tablet I’ve had the kind of hyperfocus I hadn’t been able to achieve for I don’t know how long. I continue to get distracted, but it’s easier to get back to where I was. The big change is that I no longer feel the constant need for stimulation. The need to “feel accelerated”, as I call it, which would result in my opening dozens of tabs on my browser and checking websites one by one without any need to do so. Sometimes it would end in the rabbithole of playing online chess, and wasting hours at a time.

I’ve written about ADHD before on this blog, and elsewhere. I’ve written it as a condition where you’re unable to hold attention on what you are doing, and getting distracted easily. In the past I’d come off medication because I missed being distracted – in my methylphenidated state, I have missed the ability to think laterally which I’m so capable of in my “ground state”.

Thinking about it, though, it’s not distraction or the lack of it that’s the problem with ADHD. It’s the constant need for “stimulus”. It’s the constant need to “keep doing something” that makes me fidgety. It’s possibly the same feeling that made me run out of class when I was in kindergarten and do somersaults. The same feeling that would make me open my computer and open a dozen chat windows upon coming home from work a decade ago. Well the latter had its good parts – a lot of the time, one of those dozen chat windows would involve the person who I later married.

It’s funny how I got here today, in this methylphenidated state. As you might know, I’ve been living in London for nearly two years now. And the medical system here is government-run.

In October 2017, when I was in the middle of my last (and largely unsuccessful) full time job, I felt the need to get back on to ADHD medication. I got an appointment with, and met my general practitioner in November 2017. He asked me to share with him my diagnosis of ADHD from back home. In December 2017 I was back in India, and I got back my medical records, and shared a copy with him in January 2018.

In February 2018 I got a call to set up an appointment with the mental health practice. It was at a clinic some distance away from home, and I met the psychiatrist in March 2018. I was administered the usual ADHD questionnaire and told that I would be contacted by the “national ADHD centre” in a “couple of months”.

It was finally in January of 2019 that I heard back about this. It was my GP once again, saying my prescription for methylphenidate was ready, and I should start taking it asap. The next day I got a call asking me to meet the psychiatrist again, in the faraway mental health clinic. And today I started taking the medication. And I’ve been so unable to switch off that I’m unable to sleep!

PS: I’m publishing this a day late. I wrote this last night but couldn’t publish it since daughter started crying and I had to rush back to bed. Hopefully I’ll be able to sleep well tonight