## Correlation and causation

So I have this lecture on “smelling (statistical) bullshit” that I’ve delivered in several places, which I inevitably start with a lesson on how correlation doesn’t imply causation. I give a large number of examples of people mistaking correlation for causation, the class makes fun of everything that doesn’t apply to them, then everyone sees this wonderful XKCD cartoon and then we move on.

One of my favourite examples of correlation-causation (which I don’t normally include in my slides) has to do with religion. Praying before an exam in which one did well doesn’t necessarily imply that the prayer resulted in the good performance in the exam, I explain. So far, there has been no outward outrage at my lectures, but this does visibly make people uncomfortable.

Going off on a tangent, the time in life when I discovered to myself that I’m not religious was when I pondered over the correlation-causation issue some six or seven years back. Until then I’d had this irrational need to draw a relationship between seemingly unrelated things that had happened together once or twice, and that had given me a lot of mental stress. Looking at things from a correlation-causation perspective, however, helped clear up my mind on those things, and also made me believe that most religious activity is pointless. This was a time in life when I got immense mental peace.

Yet, for most of the world, it is not freedom from religion but religion itself that gives them mental peace. People do absurd activities only because they think these activities lead to other good things happening, thanks to a small number of occasions when these things have coincided, either in their own lives or in the lives of their ancestors or gurus.

In one of my lectures a few years back I had remarked that one reason why humans still mistake correlation for causation is religion – for if correlation did not imply causation then most of religious rituals would be rendered meaningless and that would render people’s lives meaningless. Based on what I observed today, however, I think I’ve got this causality wrong.

It’s not because of religion that people mistake correlation for causation. Instead, we’ve evolved to recognise patterns whenever we observe them, and a side effect of that is that we immediately assume causation whenever we see things happening together. Religion is just a special case of application of this correlation-causation second nature to things in real life.

So my daughter (who is two and a half) and I were standing in our balcony this evening, observing that it had rained heavily last night. Heavy rain reminded my daughter of this time when we had visited a particular aunt last week – she clearly remembered watching the heavy rain from this aunt’s window. Perhaps none of our other visits to this aunt’s house really registered in the daughter’s imagination (it’s barely two months since we returned to Bangalore, so admittedly there aren’t that many data points), so this aunt’s house is inextricably linked in her mind to rain.

And this evening because she wanted it to rain heavily again, the daughter suggested that we go visit this aunt once again. “We’ll go to Inna Ajji’s house and then it will start raining”, she kept saying. “Yes, it rained the last time it went there, but it was random. It wasn’t because we went there”, I kept saying. It wasn’t easy to explain it.

You know when you are about to have a kid you develop visions of how you’ll bring her up, and what you’ll teach her, and what she’ll say to “jack” the world. Back then I’d decided that I’d teach my yet-unborn daughter that “correlation does not imply causation” and she could use it use it against “elders” who were telling her absurd stuff.

I hadn’t imagined that mistaking correlation for causation is so fundamental to human nature that it would be a fairly difficult task to actually teach my daughter that correlation does not imply causation! Hopefully in the next one year I can convince her.

## Surveying Income

For a long time now, I’ve been sceptical of the practice of finding out the average income in a country or state or city or locality by doing a random survey. The argument I’ve made is “whether you keep Mukesh Ambani in the sample or not makes a huge difference in your estimate”. So far, though, I hadn’t been able to make a proper mathematical argument.

In the course of writing a piece for Bloomberg Quint (my first for that publication), I figured out a precise mathematical argument. Basically, incomes are distributed according to a power law distribution, and the exponent of the power law means that variance is not defined. And hence the Central Limit Theorem isn’t applicable.

OK let me explain that in English. The reason sample surveys work is due to a result known as the Central Limit Theorem. This states that for a distribution with finite mean and variance, the average of a random sample of data points is not very far from the average of the population, and the difference follows a normal distribution with zero mean and variance that is inversely proportional to the number of points surveyed.

So if you want to find out the average height of the population of adults in an area, you can simply take a random sample, find out their heights and you can estimate the distribution of the average height of people in that area. It is similar with voting intention – as long as the sample of people you survey is random (and without bias), the average of their voting intention can tell you with high confidence the voting intention of the population.

This, however, doesn’t work for income. Based on data from the Indian Income Tax department, I could confirm (what theory states) that income in India follows a power law distribution. As I wrote in my piece:

The basic feature of a power law distribution is that it is self-similar – where a part of the distribution looks like the entire distribution.

Based on the income tax returns data, the number of taxpayers earning more than Rs 50 lakh is 40 times the number of taxpayers earning over Rs 5 crore.
The ratio of the number of people earning more than Rs 1 crore to the number of people earning over Rs 10 crore is 38.
About 36 times as many people earn more than Rs 5 crore as do people earning more than Rs 50 crore.

In other words, if you increase the income limit by a factor of 10, the number of people who earn over that limit falls by a factor between 35 and 40. This translates to a power law exponent between 1.55 and 1.6 (log 35 to base 10 and log 40 to base 10 respectively).

Now power laws have a quirk – their mean and variance are not always defined. If the exponent of the power law is less than 1, the mean is not defined. If the exponent is less than 2, then the distribution doesn’t have a defined variance. So in this case, with an exponent around 1.6, the distribution of income in India has a well-defined mean but no well-defined variance.

To recall, the central limit theorem states that the population mean follows a normal distribution with the mean centred at the sample mean, and a variance of $\frac{\sigma^2}{n}$ where $\sigma$ is the standard deviation of the underlying distribution. And when the underlying distribution itself is a power law distribution with an exponent less than 2 (as the case is in India), $\sigma$ itself is not defined.

Which means the distribution of population mean around sample mean has infinite variance. Which means the sample mean tells you absolutely nothing!

And hence, surveying is not a good way to find the average income of a population.

## Elegant and practical solutions

There are two ways in which you can tie a shoelace – one is the “ordinary method”, where you explicitly make the loops around both ends of the lace before tying together to form a bow. The other is the “elegant method” where you only make one loop explicitly, but tie with such great skill that the bow automatically gets formed.

I have never learnt to tie my shoelaces in the latter manner – I suspect my father didn’t know it either, because of which it wasn’t passed on to me. Metaphorically, however, I like to implement such solutions in other aspects.

Having been educated in mathematics, I’m a sucker for “elegant solutions”. I look down upon brute force solutions, which is why I might sometimes spend half an hour writing a script to accomplish a repetitive task that might have otherwise taken 15 minutes. Over the long run, I believe, this elegance will pay off, in terms of scaling easier.

And I suspect I’m not alone in this love for elegance. If the world were only about efficiency, brute force would prevail. That we appreciate things like poetry and music and art and what not means that there is some preference for elegance. And that extends to business solutions as well.

While going for elegance is a useful heuristic, sometimes it can lead to missing the woods for the trees (or missing the random forests for the decision trees if you may will). For there are situations that simply don’t, or won’t, scale, and where elegance will send you on a wild goose chase while a little fighter work will get the job done.

I got reminded of this sometime last week when my wife asked me for some Excel help in some work she was doing. Now, there was a recent article in WSJ which claimed that the “first rule of Microsoft Excel is that you shouldn’t let people know you’re good at it”. However, having taught a university course on spreadsheet modelling, there is no place to hide for me, and people keep coming to me for Excel help (though it helps I don’t work in an office).

So the problem wasn’t a simple one, and I dug around for about half an hour without a solution in sight. And then my wife happened to casually mention that this was a one-time thing. That she had to solve this problem once but didn’t expect to come across it again, so “a little manual work” won’t hurt.

And the problem was solved in two minutes – a minor variation of the requirement was only one formula away (did you know that the latest versions of Excel for Windows offer a “count distinct” function in pivot tables?). Five minutes of fighter work by the wife after that completely solved the problem.

Most data scientists (now that I’m not one!)  typically work in production environments, where the result of their analysis is expressed in code that is run on a repeated basis. This means that data scientists are typically tuned to finding elegant solutions since any manual intervention means that the code is not production-able and scalable.

This can mean finding complicated workarounds in order to “pull the bow of the shoelaces” in order to avoid that little bit of manual effort at the end, so that the whole thing can be automated. And these habits can extend to the occasional work that is not needed to be repeatable and scalable.

And so you have teams spending an inordinate amount of time finding elegant solutions for problems for which easy but non-scalable “solutions exist”.

Elegance is a hard quality to shake off, even when it only hinders you.

I’ll close with a fairytale – a deer looks at its reflection and admires its beautiful anchors and admonishes its own ugly legs. Lion arrives, the ugly legs help the deer run fast, but the beautiful antlers get stuck in a low tree, and the lion catches up.

## JEE coaching and high school learning

One reason I’m not as good at machine learning as I can possibly be is because I suck at linear algebra. I totally completely suck at it. Seven years of usage of R has meant that at least I no longer get spooked out by the very sight of vectors or matrices, and I understand the concept of matrix multiplication (an operator rotating a vector), but I just don’t get linear algebra.

For example, when I see terms such as “singular value decomposition” I almost faint. Multiple repeated attempts at learning the concept have utterly failed. Don’t even get me started on the more complicated stuff – and machine learning is full of them.

My inability to understand linear algebra runs deep, and it’s mainly due to a complete inability to imagine vectors and matrices and matrix operations. As far back as I remember, I have hated matrices and have tried to run away from it.

For a long time, I had placed the blame for this on IIT Madras, whose mathematics department in its infinite wisdom had decided to get its brilliant Graph Theory expert to teach us matrices. Thinking back, though, I remember going in to MA102 (Vectors, Matrices and Differential Equations) already spooked. The rot had set in even earlier – in school.

The problem with class 11 in my school (a fairly high-profile school which was full of studmax characters) was that most people harboured ambitions of going to IIT, and had consequently enrolled themselves in formal coaching “factories”. As a result, these worthies always came to maths, physics and chemistry classes “ahead” of people like me who didn’t go for such classes (I’d decided to chill for a year after a rather hectic class 10 when I’d been under immense pressure to get my school a “centum”).

Because a large majority of the class already knew what was to be taught, teachers had an incentive to slack. Also the fact that most students were studmax had meant that people preferred to mug on their own rather than display their ignorance in class. And so jai happened.

I remember the class when vectors and matrices were introduced (it was in class 11). While I don’t remember too many details, I do remember that a vocal majority already knew about “dot product” and “cross product”. It was similar a few days later when the vocal majority knew matrix multiplication.

And so these concepts were glossed over, and lacking a grounding in fundamentals, I somehow never “got” the concept.

In my year (2000), CBSE decided to change format for its maths examination – everyone had to attempt “Part A” (worth 70 marks) and then had a choice between “Part B” (vectors, matrices, etc.) and “Part C” (introductory statistics). Most science students were expected to opt for Part B (Part C had been introduced for the benefit of commerce students studying maths since they had little to gain from reading about vectors). For me and one other guy from my class, though, it was a rather obvious choice to do Part C.

I remember the invigilator (who was from another school) being positively surprised during my board exam when I mentioned that I was going to attempt Part C instead of Part B. He muttered something to the extent of “isn’t that for commerce students?” but to his credit permitted us to do the paper in whatever way we wanted (I fail to remember why I had to mention to him I was doing Part C – maybe I needed log tables to do that).

Seventeen odd years down the line, I continue to suck at linear algebra and be stud at statistics. And it is all down to the way the two subjects were introduced to me in school (JEE statistics wasn’t up to the same standard as Part C so the school teachers did a great job of teaching that).

## The Quants

Since investment bank bashing seems to be in fashion nowadays, let me add my two naya paise to the fire. I exited a large investment bank in September 2011, after having worked for a little over two years there. I used to work as a quant, spending most of my time building pricing and execution models. I was a bit of an anomaly there, since I had an MBA degree. What was also unusual was that I had previously spent time as a salesperson in an investment bank . Most other people in the quant organization came from a heavily technical background, with the most popular degrees being PhDs in Physics and Maths, and had no experience or interest in the business side of things at the bank.

You might wonder what PhDs in Physics and Maths do at investment banks. I used to wonder the same before I joined. Yes, there are some tough mathematical puzzles to be solved in the course of devising pricing and execution algorithms (part of the work that us quants did), which probably kept them interested. However, the one activity for which these pure science PhDs were prized for, and which they spent most of their time doing, was C++ coding. Yeah, you read that right. These guys could write mean algorithms – I don’t know if even Computer Science graduates (and there were plenty of those) could write as clean (and quick) C++ code as these guys.

While most banks stress heavily on diversity, and makes considerable efforts (in the form of recruitment, affiliation groups, etc.)  to ensure a diverse workplace, it is not enough to prevent a large portion of quants coming from a similar kind of background. And when you put large numbers of Physics and Math PhDs together, it is inevitable that there is some degree of groupthink. You have the mavericks like me who like to model things differently, but if everyone else in your organization thinks one way, who do you go to in order to push your idea? You stop dropping your own ideas and start thinking like everyone else does. And you become yet another cog in the big quant wheel.

The biggest problem with hardcore Math people working on trading strategies is that they do not seek to solve a business problem through their work – they seek to solve a math problem, which they will strive to do as elegantly and correctly as it is possible. It doesn’t matter to the quants if the assumption of asset prices being lognormal is widely off the mark. In fact, they don’t care how the models behave. All they care about is about their formulae and results being correct – GIVEN the model of the market. I remember once spending a significant amount of time (maybe a couple of weeks) looking for bugs in my pricing logic because prices from two methods didn’t match up to the required precision of twelve decimal places (or was it fourteen? I’ve forgotten). And this after making the not-very-accurate assumption that asset prices are log normal. The proverb that says, “measure with a micrometer, mark with a chalk, cut with an axe”, is quite apt to describe the priorities of most quants.

Before I joined the firm, I used to wonder how bankers can be so stupid to make the kind of obvious silly errors (like assuming that housing prices cannot go down) that led to the global financial crisis of 2008. Two years at the firm, however, made me realize why these things happen. In fact, the bigger surprise, after the two years there, was about why such gross mistakes don’t occur more regularly. I think I’ve already talked about the culprits earlier in the post, but I should repeat myself.

First, a large number of guys building models come from similar backgrounds, so they think similarly. Because so many people think similarly, the rest train themselves to think similarly (or else get nudged out, by whatever means). So you have massive organizations full of massively talented brilliant minds which all think similarly! Who is to ask the uncomfortable questions? Next, who has time to ask the uncomfortable questions? Every one, from Partner downwards, has significant amount of “day to day work” to take care of every day. Bankers are driven hard (in that sense, and in that they are mostly brilliant, they do deserve the money they make), and everyone has a full plate (if you don’t it is an indication that you may not have a plate any more). There is little scope for strategic thinking. Again, remember that in an organization full of people who think similarly, people who have got promoted and made it to the top are likely to be those that think best along that particular axis. While it is the top management of the firm that is supposed to be responsible for the “big” strategic decisions, the kind of attention to details (which Math/Physics PhDs are rich in) that takes them to the top doesn’t leave them enough bandwidth for such thinking.

And so shit happens. Anyone who had the ability to think differently has either been “converted” to the conventional way of thinking, or is playing around with big bucks at some tiny hedge fund somewhere – because he found that it wasn’t possible to grow significantly in a place where most people think different to the way he thinks, and no one has the patience for his thinking.

This is the real failure in investment banking (markets) culture that has led to innumerable crises. The screwing over of clients and loss of “culture” in terms of ethics is a problem that has existed for a long time, and nothing new, contrary to what Greg Smith (formerly of Goldman Sachs) has written. The real failure of banking culture is this promotion of one-dimensional in-line-with-the-party thought, and the curbs against thinking and acting contrary to popular (in the firm) wisdom. It is this failure of culture that has led to the large negative shocks to the economy in the years gone by, and it is these shocks that have led common people to lose money rather than one off acts by banks where they don’t necessarily act in the interest of clients. And irrespective of how many Business Standards Committees and Risk Committees banks constitute, it is unlikely that this risk is going to go away any time soon. And I can’t think of a regulatory cure against this.