Correlation and causation

So I have this lecture on “smelling (statistical) bullshit” that I’ve delivered in several places, which I inevitably start with a lesson on how correlation doesn’t imply causation. I give a large number of examples of people mistaking correlation for causation, the class makes fun of everything that doesn’t apply to them, then everyone sees this wonderful XKCD cartoon and then we move on.

One of my favourite examples of correlation-causation (which I don’t normally include in my slides) has to do with religion. Praying before an exam in which one did well doesn’t necessarily imply that the prayer resulted in the good performance in the exam, I explain. So far, there has been no outward outrage at my lectures, but this does visibly make people uncomfortable.

Going off on a tangent, the time in life when I discovered to myself that I’m not religious was when I pondered over the correlation-causation issue some six or seven years back. Until then I’d had this irrational need to draw a relationship between seemingly unrelated things that had happened together once or twice, and that had given me a lot of mental stress. Looking at things from a correlation-causation perspective, however, helped clear up my mind on those things, and also made me believe that most religious activity is pointless. This was a time in life when I got immense mental peace.

Yet, for most of the world, it is not freedom from religion but religion itself that gives them mental peace. People do absurd activities only because they think these activities lead to other good things happening, thanks to a small number of occasions when these things have coincided, either in their own lives or in the lives of their ancestors or gurus.

In one of my lectures a few years back I had remarked that one reason why humans still mistake correlation for causation is religion – for if correlation did not imply causation then most of religious rituals would be rendered meaningless and that would render people’s lives meaningless. Based on what I observed today, however, I think I’ve got this causality wrong.

It’s not because of religion that people mistake correlation for causation. Instead, we’ve evolved to recognise patterns whenever we observe them, and a side effect of that is that we immediately assume causation whenever we see things happening together. Religion is just a special case of application of this correlation-causation second nature to things in real life.

So my daughter (who is two and a half) and I were standing in our balcony this evening, observing that it had rained heavily last night. Heavy rain reminded my daughter of this time when we had visited a particular aunt last week – she clearly remembered watching the heavy rain from this aunt’s window. Perhaps none of our other visits to this aunt’s house really registered in the daughter’s imagination (it’s barely two months since we returned to Bangalore, so admittedly there aren’t that many data points), so this aunt’s house is inextricably linked in her mind to rain.

And this evening because she wanted it to rain heavily again, the daughter suggested that we go visit this aunt once again. “We’ll go to Inna Ajji’s house and then it will start raining”, she kept saying. “Yes, it rained the last time it went there, but it was random. It wasn’t because we went there”, I kept saying. It wasn’t easy to explain it.

You know when you are about to have a kid you develop visions of how you’ll bring her up, and what you’ll teach her, and what she’ll say to “jack” the world. Back then I’d decided that I’d teach my yet-unborn daughter that “correlation does not imply causation” and she could use it use it against “elders” who were telling her absurd stuff.

I hadn’t imagined that mistaking correlation for causation is so fundamental to human nature that it would be a fairly difficult task to actually teach my daughter that correlation does not imply causation! Hopefully in the next one year I can convince her.

English Premier League: Goal Difference to points correlation

So I was just looking down the English Premier League Table for the season, and I found that as I went down the list, the goal difference went lower. There’s nothing counterintuitive in this, but the degree of correlation seemed eerie.

So I downloaded the data and plotted a scatter-plot. And what do you have? A near-perfect regression. I even ran the regression and found a 96% R Square.

In other words, this EPL season has simply been all about scoring lots of goals and not letting in too many goals. It’s almost like the distribution of the goals itself doesn’t matter – apart from the relegation battle, that is!

PS: Look at the extent of Manchester City’s lead at the top. And what a scrap the relegation is!

Biases, statistics and luck

Tomorrow Liverpool plays Manchester City in the Premier League. As things stand now I don’t plan to watch this game. This entire season so far, I’ve only watched two games. First, I’d gone to a local pub to watch Liverpool’s visit to Manchester City, back in September. Liverpool got thrashed 5-0.

Then in October, I went to Wembley to watch Tottenham Hotspur play Liverpool. The Spurs won 4-1. These two remain Liverpool’s only defeats of the season.

I might consider myself to be a mostly rational person but I sometimes do fall for the correlation-implies-causation bias, and think that my watching those games had something to do with Liverpool’s losses in them. Never mind that these were away games played against other top sides which attack aggressively. And so I have this irrational “fear” that if I watch tomorrow’s game (even if it’s from a pub), it might lead to a heavy Liverpool defeat.

And so I told Baada, a Manchester City fan, that I’m not planning to watch tomorrow’s game. And he got back to me with some statistics, which he’d heard from a podcast. Apparently it’s been 80 years since Manchester City did the league “double” (winning both home and away games) over Liverpool. And that it’s been 15 years since they’ve won at Anfield. So, he suggested, there’s a good chance that tomorrow’s game won’t result in a mauling for Liverpool, even if I were to watch it.

With the easy availability of statistics, it has become a thing among football commentators to supply them during the commentary. And from first hearing, things like “never done this in 80 years” or “never done that for last 15 years” sounds compelling, and you’re inclined to believe that there is something to these numbers.

I don’t remember if it was Navjot Sidhu who said that statistics are like a bikini (“what they reveal is significant but what they hide is crucial” or something). That Manchester City hasn’t done a double over Liverpool in 80 years doesn’t mean a thing, nor does it say anything that they haven’t won at Anfield in 15 years.

Basically, until the mid 2000s, City were a middling team. I remember telling Baada after the 2007 season (when Stuart Pearce got fired as City manager) that they’d be surely relegated next season. And then came the investment from Thaksin Shinawatra. And the appointment of Sven-Goran Eriksson as manager. And then the youtube signings. And later the investment from the Abu Dhabi investment group. And in 2016 the appointment of Pep Guardiola as manager. And the significant investment in players after that.

In other words, Manchester City of today is a completely different team from what they were even 2-3 years back. And they’re surely a vastly improved team compared to a decade ago. I know Baada has been following them for over 15 years now, but they’re unrecognisable from the time he started following them!

Yes, even with City being a much improved team, Liverpool have never lost to them at home in the last few years – but then Liverpool have generally been a strong team playing at home in these years! On the other hand, City’s 18-game winning streak (which included wins at Chelsea and Manchester United) only came to an end (with a draw against Crystal Palace) rather recently.

So anyways, here are the takeaways:

  1. Whether I watch the game or not has no bearing on how well Liverpool will play. The instances from this season so far are based on 1. small samples and 2. biased samples (since I’ve chosen to watch Liverpool’s two toughest games of the season)
  2. 80-year history of a fixture has no bearing since teams have evolved significantly in these 80 years. So saying a record stands so long has no meaning or predictive power for tomorrow’s game.
  3. City have been in tremendous form this season, and Liverpool have just lost their key player (by selling Philippe Coutinho to Barcelona), so City can fancy their chances. That said, Anfield has been a fortress this season, so Liverpool might just hold (or even win it).

All of this points to a good game tomorrow! Maybe I should just watch it!

 

 

Scott Adams, careers and correlation

I’ve written here earlier about how much I’ve been influenced by Scott Adams’s career advice about “being in top quartile of two or more things“.  To recap, this is what Adams wrote nearly ten years back:

If you want an average successful life, it doesn’t take much planning. Just stay out of trouble, go to school, and apply for jobs you might like. But if you want something extraordinary, you have two paths:

1. Become the best at one specific thing.
2. Become very good (top 25%) at two or more things.

The first strategy is difficult to the point of near impossibility. Few people will ever play in the NBA or make a platinum album. I don’t recommend anyone even try.

Having implemented this to various degrees of success over the last 5-6 years, I propose a small correction – basically to follow the second strategy that Adams has mentioned, you need to take correlation into account.

Basically there’s no joy in becoming very good (top 25%) at two or more correlated things. For example, if you think you’re in the top 25% in terms of “maths and physics” or “maths and computer science” there’s not so much joy because these are correlated skills. Lots of people who are very good at maths are also very good at physics or computer science. So there is nothing special in being very good at such a combination.

Why Adams succeeded was that he was very good at 2-3 things that are largely uncorrelated – drawing, telling jokes and understanding corporate politics are not very correlated to each other. So the combination of these three skills of his was rather unique to find, and their combination resulted in the wildly successful Dilbert.

So the key is this – in order to be wildly successful, you need to be very good (top 25%) at two or three things that are not positively correlated with each other (either orthogonal or negative correlation works). That ensures that if you can put them together, you can offer something that very few others can offer.

Then again, the problem there is that the market for this combination of skills will be highly illiquid – low supply means people who might demand such combinations would have adapted to make do with some easier to find substitute, so demand is lower, and so on. So in that sense, again, it’s a massive hit-or-miss!

Writing and depression

It is now a well-documented fact (that I’m too lazy to google and provide links) that there exists a relationship between mental illness and creative professions such as writing.

Most pieces that talk about this relationship draw the causality in one way – that the mental illness helped the writer (or painter or filmmaker or whoever) focus and channel emotions into the product.

Having taken treatment for depression in the past, and having just finished a manuscript of a book, I might tend to agree that there exists a relationship between creativity and depression. However, I wonder if the causality runs the other way.

I’ve mentioned here a couple of months back that writing a book is hard because you are working months together with little tangible feedback, and there’s a real possibility that it might flop miserably. Soncequently, you put fight to make the product as good as you can.

In the absence of feedback, you are your greatest critic, and you read, and re-read what you’ve written; you edit, and re-edit your passages until you’re convinced that they’re as good as they can be.

You get obsessed with your product. You start thinking that if it’s not perfect it is all doomed. You downplay the (rather large) random component that might affect the success of the product, and instead focus on making it as perfect as you can.

And this obsession can drive you mad. There are days when you sit with your manuscript and feel useless. There are times when you want to chuck months’ effort down the drain. And that depresses you. And affects other parts of your life, mostly negatively!

Again it’s rather early that I’m writing this blog post now – at a time when I’m yet to start marketing my book to publishers. However, it’s important that I document this relationship and causality now – before either spectacular success or massive failure take me over!

Correlation in defence purchases

Nitin Pai has a nice piece on defence procurement in Business Standard today. He writes:

Even if the planning process works as intended, it still means that the defence ministry merely adds up the individual requirements and goes about buying them. This is sub-optimal: consider a particular emerging threat that everyone agrees India needs to be prepared for. The army, navy and air force then prepare their own strategies and operational plans, for which they draw up a list of requirements. At the back of their minds, they know that the defence budget is more-or-less divided in a fixed ratio among them.

What he is saying, in other words, is that the defence ministry simply takes the arithmetic sum of demands from various components of the military, rather than taking correlation into account.

Let me explain using a toy example.

Let’s say that the Western wing of the Indian army (I’m making this up), the one that guards the border with Pakistan, wants 100 widgets that will come useful in case of a war. Let’s say that the Eastern wing of the Indian army, which guards the China border, wants 150 such widgets for the same purpose. The question is how many you should purchase.

According to Nitin, the defence ministry now doesn’t think. It simply adds up and buys 250. The question is if we actually need 250.

Let’s assume that these widgets are easily transportable, and let’s assume that the probability of a simultaneous conventional conflict with Pakistan and China is zero (given all three are nuclear states, this is a fair assumption). Do we still need 250 widgets? The answer is no, we only need 150, since we can quickly swing them over to where they are most required, and at the maximum, we need 150!

This is a case of negative correlation. There could be a case of positive correlation also – perhaps the chance of an India-China conventional conflict actually goes up when an India-Pakistan conventional conflict is on, and this might lead to more prolonged battles, meaning we might need more than 250 widgets! Or we have positive correlation.

The most famous example of ignoring correlation was the 2008 financial crisis, when ignored positive correlation led to mortgage backed securities and their derivatives blowing up. The Indian defence ministry can’t afford such a mistake.

Narendra Modi and the Correlation Term

In a speech in Canada last night, Prime Minister Narendra Modi said that the relationship between India and Canada is like the “2ab term” in the formula for expansion of (a+b)^2.

Unfortunately for him, this has been widely lampooned on twitter, with some people seemingly not getting the mathematical reference, and others making up some unintended consequences of it.

In my opinion, however, it is a masterstroke, and brings to notice something that people commonly ignore – what I call as the “correlation term”. When any kind of break up or disagreement happens – like someone quitting a job, or a couple breaking up, or a band disbanding, people are bound to ask the question of whose fault it was. The general assumption is that if two entities did not agree, it was because both of them sucked.

However, considering the frequency at which such events (breakups or disagreements ) happen, and that people who are generally “good” are involved in such events, the badness of one of the parties involve simply cannot explain them. So the question arises – if both parties were flawless why did the relationship go wrong? And this is where the correlation term comes in!

It is rather easy to explain using vector calculus. If you have two vectors A and B, the magnitude of the sum of the two vectors is given by \sqrt{|A|^2 + |B|^2 + 2 |A||B| cos \theta} where |A|,|B| are the magnitudes of the two vectors respectively and \theta is the angle between them. It is easy to see from the above formula that the magnitude of the sum of the vectors is dependent not only on the magnitudes of the individual vectors, but also on the angle between them.

To illustrate with some examples, if A and B are perfectly aligned (\theta = 0, cos \theta = 1), then the magnitude of their vector sum is the sum of their magnitudes. If they oppose each other, then the magnitude of their vector sum is the difference of their magnitudes. And if A and B are orthogonal, then cos \theta = 0 or the magnitude of their vector sum is \sqrt{|A|^2 + |B|^2}.

And if we move from vector algebra to statistics, then if A and B represent two datasets, the “cos \theta” is nothing but the correlation between A and B. And in the investing world, correlation is a fairly important and widely used concept!

So essentially, the concept that the Prime Minister alluded to in his lecture in Canada is rather important, and while it is commonly used in both science and finance, it is something people generally disregard in their daily lives. From this point of view, kudos to the Prime Minister for bringing up this concept of the correlation term! And here is my interpretation of it:

At first I was a bit upset with Modi because he only mentioned “2ab” and left out the correlation term (\theta). Thinking about it some more, I reasoned that the reason he left it out was to imply that it was equal to 1, or that the angle between the a and b in this case (i.e. India and Canada’s interests) is zero, or in other words, that India and Canada’s interests are perfectly aligned! There could have been no better way of putting it!

So thanks to the Prime Minister for bringing up this rather important concept of correlation to public notice, and I hope that people start appreciating the nuances of the concept rather than brainlessly lampooning him!