complexity – Pertinent Observations

Conductors and CAPM

For a long time I used to wonder why orchestras have conductors. I possibly first noticed the presence of the conductor sometime in the 1990s when Zubin Mehta was in the news. And then I always wondered why this person, who didn’t play anything but stood there waving a stick, needed to exist. Couldn’t the orchestra coordinate itself like rockstars or practitioners of Indian music forms do?

And then i came across this video a year or two back.

And then the computer science training I’d gone through two decades back kicked in – the job of an orchestra conductor is to reduce an O(n^2) problem to an O(n) problem.

For a group of musicians to make music, they need to coordinate with each other. Yes, they have the staff notation and all that, but still they need to know when to speed up or slow down, when to make what transitions, etc. They may have practiced together but the professional performance needs to be flawless. And so they need to constantly take cues from each other.

When you have $n$ musicians who need to coordinate, you have $\frac{n.(n-1)}{2}$ pairs of people who need to coordinate. When $n$ is small, this is trivial, and so you see that small ensembles or rock bands can easily coordinate. However, as $n$ gets large, $n^2$ grows well-at-a-faster-rate. And that is a problem, and a risk.

Enter the conductor. Rather than taking cues from one another, the musicians now simply need to take cues from this one person. And so there are now only $n$ pairs that need to coordinate – each musician in the band with the conductor. Or an $O(n^2)$ problem has become an $O(n)$ problem!

For whatever reason, while I was thinking about this yesterday, I got reminded of legendary finance professor R Vaidya‘s class on capital asset pricing model (CAPM), or as he put it “Sharpe single index model” (surprisingly all the links I find for this are from Indian test prep sites, so not linking).

We had just learnt portfolio theory, and how using the expected returns, variances and correlations between a set of securities we could construct an “efficient frontier” of securities that could give us the best risk-adjusted return. Seemed very mathematically elegant, except that in case you needed to construct a portfolio of $n$ stocks, you needed $n^2$ correlations. In other word, an $O(n^2)$ problem.

And then Vaidya introduced CAPM, which magically reduced the problem to an $O(n)$ problem. By suddenly introducing the concept of an index, all that mattered for each stock now was its beta – the coefficient of its returns proportional to the index returns. You didn’t need to care about how stocks reacted with each other any more – all you needed was the relationship with the index.

In a sense, if you think about it, the index in CAPM is like the conductor of an orchestra. If only all $O(n^2)$ problems could be reduced to $O(n)$ problems this elegantly!

Exponential need not mean explosive

Earlier on this blog I’ve written about the misuse of the term “exponential” when it is used to describe explosive increase in a particular number. My suspicion is that this misuse of the word “exponential” comes from Computer Science and complexity theory – where the hardest problems to crack are those which require time/space that is exponential in the size of the data. In fact, the entire definition of P, NP and NP-completeness have to do with the distinction between problems that take exponential resources versus those that take resources that are a polynomial function of the size of data.

Earlier today, I shared this blog post by Bryan Caplan on Puerto Rican immigration into the United States with a comment “exponential immigration”. I won’t rule out drawing some flak for this particular description, for Caplan’s thesis is that Puerto Rican immigration took a long time indeed to “explode”. However, I would expect that the flak I get for describing this variable as “exponential” would come from people who mistake “exponential” for “explosive”.

Caplan’s theory in the above linked blog post is that immigration from Puerto Rico to the United states was extremely slow for a very long time. It was in the late 1890s that a US Supreme Court ruling allowed free access to Puerto Ricans to the United States. However, it took close to a hundred years for this immigration to “explode”. Caplan’s theory is that the number of people moving to the US per year is a function of the number of Puerto Ricans who are already there!

In other words, the immigration process can be described by our favourite equation: dX/dt = kX, solving which we will get an equation of the form X = a exp(kt), which means that the growth is indeed exponential in time! Yet, given a rather small value of X_0 (the number of Puerto Ricans in the United States at the time the law was passed), and given a small value of k, the increase has been anything but explosive, despite it being exponential!

The point of this post is worth reiterating: the word “exponential”, in its common use, has been taken to be synonymous with “explosive”, and this is wrong. Exponential growth need not be explosive, and explosive growth need not be exponential! The two concepts are unrelated and people would do well to not confuse one with the other.

Analytics and complexity

I recently learnt that a number of people think that the more the number of variables you use in your model, the better your model is! What has surprised me is that I’ve met a lot of people who think so, and recommendations for simple models haven’t been taken too kindly.

The conversation usually goes like this

“so what variables have you considered for your analysis of ______ ?”
“A,B,C”
“Why don’t you consider D,E,F,… X,Y,Z also? These variables matter for these reasons. You should keep all of them and build a more complete model”
“Well I considered them but they were not significant so my model didn’t pick them up”
“No but I think your model is too simplistic if it uses only three variables”

This is a conversation i’ve had with so many people that i wonder what kind of conceptions people have about analytics. Now I wonder if this is because of the difference in the way I communicate compared to other “analytics professionals”.

When you do analytics, there are two ways to communicate – to simplify and to complicate (for lack of a better word). Based on my experience, what I find is that a majority of analytics professionals and modelers prefer to complicate – they talk about complicated statistical techniques they use for solving the problem (usually with fancy names) and bulldoze the counterparty into thinking they are indeed doing something hi-funda.

The other approach, followed by (in my opinion) a smaller number of people, is to simplify. You try and explain your model in simple terms that the counterparty will understand. So if your final model contains only three explanatory variables, you tell them that only three variables are used, and you show how each of these variables (and combinations thereof) contribute to the model. You draw analogies to models the counterparty can appreciate, and use that to explain.

Now, like analytics professionals can be divided into two kinds (as above), I think consumers of analytics can also be divided into two kinds. There are those that like to understand the model, and those that simply want to get into the insights. The former are better served by the complicating type analytics professionals, and the latter by the simplifying type. The other two combinations lead to disaster.

Like a good management consultant, I represent this problem using the following two-by-two:

As a principle, I like to explain models in a simplified fashion, so that the consumer can completely understand it and use it in a way he sees appropriate. The more pragmatic among you, however, can take a guess on what type the consumer is and tweak your communication accordingly.