Structures of professions and returns to experience

I’ve written here a few times about the concept of “returns to experience“. Basically, in some fields such as finance, the “returns to experience” is rather high. Irrespective of what you have studied or where, how long you have continuously been in the industry and what you have been doing has a bigger impact on your performance than your way of thinking or education.

In other domains, returns to experience is far less. After a few years in the profession, you would have learnt all you had to, and working longer in the job will not necessarily make you better at it. And so you see that the average 15 years experience people are not that much better than the average 10 years experience people, and so you see salaries stagnating as careers progress.

While I have spoken about returns to experience, till date, I hadn’t bothered to figure out why returns to experience is a thing in some, and only some, professions. And then I came across this tweetstorm that seeks to explain it.

Now, normally I have a policy of not reading tweetstorms longer than six tweets, but here it was well worth it.

It draws upon a concept called “cognitive flexibility theory”.

Basically, there are two kinds of professions – well-structured and ill-structured. To quickly summarise the tweetstorm, well-structured professions have the same problems again and again, and there are clear patterns. And in these professions, first principles are good to reason out most things, and solve most problems. And so the way you learn it is by learning concepts and theories and solving a few problems.

In ill-structured domains (eg. business or medicine), the concepts are largely the same but the way the concepts manifest in different cases are vastly different. As a consequence, just knowing the theories or fundamentals is not sufficient in being able to understand most cases, each of which is idiosyncratic.

Instead, study in these professions comes from “studying cases”. Business and medicine schools are classic examples of this. The idea with solving lots of cases is NOT that you can see the same patterns in a new case that you see, but that having seen lots of cases, you might be able to reason HOW to approach a new case that comes your way (and the way you approach it is very likely novel).

Picking up from the tweetstorm once again:

 

It is not hard to see that when the problems are ill-structured or “wicked”, the more the cases you have seen in your life, the better placed you are to attack the problem. Naturally, assuming you continue to learn from each incremental case you see, the returns to experience in such professions is high.

In securities trading, for example, the market takes very many forms, and irrespective of what chartists will tell you, patterns seldom repeat. The concepts are the same, however. Hence, you treat each new trade as a “case” and try to learn from it. So returns to experience are high. And so when I tried to reenter the industry after 5 years away, I found it incredibly hard.

Chess, on the other hand, is well-structured. Yes, alpha zero might come and go, but a lot of the general principles simply remain.

Having read this tweetstorm, gobbled a large glass of wine and written this blogpost (so far), I’ve been thinking about my own profession – data science. My sense is that data science is an ill-structured profession where most practitioners pretend it is well-structured. And this is possibly because a significant proportion of practitioners come from academia.

I keep telling people about my first brush with what can now be called data science – I was asked to build a model to forecast demand for air cargo (2006-7). The said demand being both intermittent (one order every few days for a particular flight) and lumpy (a single order could fill up a flight, for example), it was an incredibly wicked problem.

Having had a rather unique career path in this “industry” I have, over the years, been exposed to a large number of unique “cases”. In 2012, I’d set about trying to identify patterns so that I could “productise” some of my work, but the ill-structured nature of problems I was taking up meant this simply wasn’t forthcoming. And I realise (after having read the above-linked tweetstorm) that I continue to learn from cases, and that I’m a much better data scientist than I was a year back, and much much better than I was two years back.

On the other hand, because data science attracts a lot of people from pure science and engineering (classically well-structured fields), you see a lot of people trying to apply overly academic or textbook approaches to problems that they see. As they try to divine problem patterns that don’t really exist, they fail to recognise novel “cases”. And so they don’t really learn from their experience.

Maybe this is why I keep saying that “in data science, years of experience and competence are not correlated”. However, fundamentally, that ought NOT to be the case.

This is also perhaps why a lot of data scientists, irrespective of their years of experience, continue to remain “junior” in their thinking.

PS: The last few paragraphs apply equally well to quantitative finance and economics as well. They are ill-structured professions that some practitioners (thanks to well-structured backgrounds) assume are well-structured.

A one in billion trillion event

It seems like capital markets quants have given up on the lognormal model for good, for nobody described Facebook’s stock price drop last Thursday as a “one in a billion trillion event”. For that is the approximate probability of it happening, if we were to assume a lognormal model of the market.

Created using Quantmod package. Data from Yahoo.

Without loss of generality, we will use 90 days trailing data to calculate the mean and volatility of stock returns. As of last Thursday (the day of the fall), the daily mean returns for FB was 0.204%, or an annualised return of 51.5% (as you can see, very impressive!). The daily volatility in the stock (using a 90-day lookback period again) was 1.98%, or an annualised volatility of 31.4% . While it is a tad on the higher side, it is okay considering the annual return of 51.5%.

Now, traditional quantitative finance models have all used a lognormal distribution to represent stock prices, which implies that the distribution of stock price returns is normal. Under such an assumption, the likelihood of a 18.9% drop in the value of Facebook (which is what we saw on Thursday) is very small indeed.

In fact, to be precise, when the stock is returning 0.204% per day with a vol of 1.98% per day, the an 18.9% drop is a 9.7 sigma event. In other words, if the distribution of returns were to be normal, Thursday’s drop is 9 sigmas away from normal. Remember that most quality control systems (admittedly in industrial settings, where faults are indeed governed by a nearly normal distribution) are set for a six sigma limit.

Another way to look at Thursday’s 9.7 sigma event is that again under the normal distribution, the likelihood of seeing this kind of a fall in a day is $math ~10^{-21}$. Or one in a billion trillion. In terms of the number of trading days required for such a fall to arrive at random, it is of the order of a billion billion years, which is an order of magnitude higher than the age of the universe!

In fact, when the 1987 stock market crash (black monday) happened, this was the defence the quants gave for losing their banks’ money – that it was an incredibly improbable event. Now, my reading of the papers nowadays is sketchy, and I mostly consume news via twitter, but I haven’t heard a single such defence from quants who lost money in the Facebook crash. In fact, I haven’t come across too many stories of people who lost money in the crash.

Maybe it’s the power of diversification, and maybe indexing, because of which Facebook is now only a small portion of people’s portfolios. A 20% drop in a stock that is even 10% of your portfolio erodes your wealth by 2%, which is tolerable. What possibly caused traders to jump out of windows on Black Monday was that it was a secular drop in the US market then.

Or maybe it’s that the lessons learnt from Black Monday have been internalised, and included in models 30 years hence (remember that concepts such as volatility smiles and skews, and stochastic volatility, were introduced in the wake of the 1987 crash).

That a 20% drop in one of the five biggest stocks in the United States didn’t make for “human stories” or stories about “one in a billion billion event” is itself a story! Or maybe my reading of the papers is heavily biased!

PostScript

Even after the spectacular drop, the Facebook stock at the time of this update is trading at 168.25, a level last seen exactly 3 months ago – on 26th April, following the last quarter results of Facebook. That barely 3 months’ worth of earnings have been wiped out by such a massive crash suggests that the only people to have lost from the crash are traders who wrote out of the money puts.

Studying on coursera

In the last one year or more I’ve signed up for and dropped out from at least a dozen coursera courses. The problem has been that the video lectures have not kept me engaged. I seem to multitask while watching these videos, and the sheer volume of videos in some of these lectures has been such that I’ve quickly fallen behind, and then lost interest. I must, however, admit that many of these courses haven’t been particularly challenging. In courses such as “model thinking” or “social network analysis” I’ve already known a lot of the stuff, and thus lost interest. Modern World History (by Philip Zelikow ) was more like an information-only course which I could have consumed better in the form of a book.

Given that I’ve had bursts of signing up for courses and then not following up on them, for the last six months I’ve avoided signing up for any new courses. Until two weeks back when, on a reasonably jobless evening during a visit to my client’s Mumbai office, I decided to sign up for this course on Asset Pricing. And what a course it has been so far!

I went to bed close to midnight last night. I watched neither the Champions League final nor Arsenal’s draw at West Brom. I was doing my assignments. I spent three hours on a Sunday evening doing my assignments of the coursera Asset Pricing course, offered by Prof John Cochrane of the University of Chicago.

I’ve only completed the assignments of “Week 0” of the eight-week long course, and have watched the lectures of “Week 1” and I’m hooked already. I must admit that nobody has taught me finance like this so far. In IIM Bangalore, where I got my MBA seven years ago, we had a course on microeconomics, a course on corporate finance and a course on financial derivatives (elective). The problem, however, was that nobody made the links between any of these.

We studied the concept of marginal utility in Economics, but none of the finance professors touched it. In corporate finance, we touched upon CAPM and Modigliani-Miller but none of the later finance courses referred to them. There was a derivation of the Black-Scholes pricing model in the course on derivatives, but that didn’t touch upon any other finance we had learnt. In short, we had just been provided with the components, and nobody had helped us connect it.

The beauty of the Chicago course is that it is holistic, and so well connected. The same professor, in the same course, teaches us diffusions while in another lecture uses the marginal utility theory from economics to explain the concept of interest rates. In an assignment he has got us to do regressions and in some others we do stochastic calculus. Having seen each of these concepts separately, I’m absolutely enjoying all the connections, and that is perhaps helping me keep my interest in the course.

And it is a challenging course. It is a PhD level course at Chicago (current students at the university are taking the course in parallel with us online students) and my complacency was shattered when I got 3.5 out of 11 in my first quiz. It assumes a certain proficiency in both finance and math, and then builds on it, in a way no finance course I’ve ever taken did.

Also what sets the course apart is the quality of the assignments. Each assignment makes you think, and make you do. For example, in one assignment I did last night I had to do a set of regressions and then report t values and R^2s. In another, I had to plot a graph (which I did using excel) and then report certain points from the graph. Some other assignments make sure you have internalized what was taught in the lectures. It has been extremely exciting so far.

Based on my experience with the course so far, I hope my enthusiasm will last. I don’t know if this course will help me directly professionally. However, there is no doubt that it keeps me intellectually honest and keeps me sharp. I might not have had the option to take too many such courses during my formal education. I hope i can set this right on Coursera.

Hedgehogs and foxes: Or, a day in the life of a quant

I must state at the outset that this post is inspired by the second chapter of Nate Silver’s book The Signal and the Noise. In that chapter, which is about election forecasting, Silver draws upon the old Russian parable of the hedgehog and the fox. According to that story, the fox knows several tricks while the hedgehog knows only one – curling up into a ball. The story ends in favour of the hedgehog, as none of the tricks of the unfocused fox can help him evade the predator.

Most political pundits, says Silver, are like hedgehogs. They have just one central idea to their punditry and they tend to analyze all issues through that. A good political forecaster, however, needs to be able to accept and process any new data inputs, and include that in his analysis. With just one technique, this can be hard to achieve and so Silver says that to be a good political forecaster one needs to be a fox. While this might lead to some contradictory statements and thus bad punditry, it leads to good forecasts. Anyway, you can know about election forecasting from Silver’s book.

The world of “quant” and “analytics” which I inhabit is again similarly full of hedgehogs. You have the statisticians, whose solution for every problem is a statistical model. They can wax eloquent about Log Likelihood Estimators but can have trouble explaining why you should use that in the first place. Then you have the banking quants (I used to be one of those), who are proficient in derivatives pricing, stochastic calculus and partial differential equations, but if you ask them why a stock price movement is generally assumed to be lognormal, they don’t have answers. Then you have the coders, who can hack, scrape and write really efficient code, but don’t know much math. And mathematicians who can come up with elegant solutions but who are divorced from reality.

While you might make a career out of falling under any of the above categories, to truly unleash your potential as a quant, you should be able to do all. You should be a fox and should know each of these  tricks. And unlike the fox in the Old Russian fairy tale, the key to being a good fox is to know what trick to use when. Let me illustrate this with an example from my work today (actual problem statement masked since it involves client information).

So there were two possible distributions that a particular data point could have come from and I had to try and analyze which of them it came from (simple Bayesian probability, you might think). However, calculating the probability wasn’t so straightforward, as it wasn’t a standard function. Then I figured I could solve the probability problem using the inclusion-exclusion principle (maths again), and wrote down a mathematical formulation for it.

Now, I was dealing with a rather large data set, so I would have to use the computer, so I turned my mathematical solution into pseudo-code. Then, I realized that the pseudo-code was recursive, and given the size of the problem I would soon run out of memory. I had to figure out a solution using dynamic programming. Then, following some more code optimization, I had the probability. And then I had to go back to do the Bayesian analysis in order to complete the solution. And then present the solution in the form of a “business solution”, with all the above mathematical jugglery being abstracted from the client.

This versatility can come in handy in other places, too. There was a problem for which I figured out that the appropriate solution involved building a classification tree. However, given the nature of the data at hand, none of the off-the-shelf classification tree algorithms for were ideal. So I simply went ahead and wrote my own code for creating such trees. Then, I figured that classification trees are in effect a greedy algorithm, and can lead to getting stuck at local optima. And so I put in a simulated annealing aspect to it.

While I may not have in depth knowledge of any of the above techniques (to gain breadth you have to sacrifice depth), that I’m aware of a wide variety of techniques means I can provide the solution that is best for the problem at hand. And as I go along, I hope to keep learning more and more techniques – even if I don’t use them, being aware of them will lead to better overall problem solving.

Why standard deviation is not a good measure of volatility

Most finance textbooks, at least the ones that are popular in Business Schools, use standard deviation as a measure of volatility of a stock price. In this post, we will examine why it is not a great idea. To put it in one line, the use of standard deviation loses information on the ordering of the price movement.

As earlier, let us look at two data sets and try to measure their volatility. Let us consider two time series (let’s simply call them “series1” and “series2”) and try and compare their volatilities. The table here shows the two series:

vol1 What can you say of the two series now? You think they are similar? You might notice that both contain the same set of numbers, but jumbled up.  Let us look at the volatility as expressed by standard deviation. Unsurprisingly, since both series contain the same set of numbers, the volatility of both series is identical – at 8.655.

However, does this mean that the two series are equally volatile? Not particularly, as you can see from this graph of the two series:

vol2

It is clear from the graph (if it was not clear from the table already) that Series 2 is much more volatile than series 1. So how can we measure it? Most textbooks on quantitative finance (as opposed to textbooks on finance) use “Quadratic Variation” as a measure of volatility. How do we measure quadratic variation?

If we have a series of numbers from a_1 to a_n , then the quadratic variation of this series is measured as

sum_{i=2 to n} (a_i - a_{i-1})^2

Notice that the primary difference feature of the quadratic variation is that it takes into account the sequence. So when you have something like series 2, with alternating positive and negative jumps, it gets captured in the quadratic variation. So what would be the quadratic variation values for the two time series we have here?

The QV of series 1 is 29 while that of series 2 is a whopping 6119, which is probably a fair indicator of their relative volatilities.

So why standard deviation?

Now you might ask why textbooks use standard deviation at all then, if it misses out so much of the variation. The answer, not surprisingly, lies in quantitative finance. When the price of a stock (X) is governed by a Wiener process, or

dX = sigma dW

then the quadratic variation of the stock price (between time 0 and time t) can be shown to be sigma^2 t , which for t = 1 is sigma^2 which is the variance of the process.

Because for a particular kind of process, which is commonly used to model stock price movement, the quadratic variation is equal to variance, variance is commonly used as a substitute for quadratic variation as a measure of volatility.

However, considering that in practice stock prices are seldom Brownian (either arithmetic or geometric), this equivalence doesn’t necessarily hold.

This is also a point that Benoit Mandelbrot makes in his excellent book The (mis)Behaviour of Markets. He calls this the Joseph effect (he uses the biblical story of Joseph, who dreamt of seven fat cows being eaten by seven lean foxes, and predicted that seven years of Nile floods would be followed by seven years of drought). Financial analysts, by using a simple variance (or standard deviation) to characterize volatility, miss out on such serial effects.