It’s not just about status

Rob Henderson writes that in general, relative to the value they add to their firms, senior employees are underpaid and junior employees are overpaid. This, he reasons, is because senior employees trade off money for status.

Quoting him in full:

Robert Frank suggests the reason for this is that workers would generally prefer to occupy higher-ranked positions in their work groups than lower-ranked ones. They’re forgoing more earnings to hold a higher-status position in their organization.

But this preference for a higher-status position can be satisfied within any given organization.

After all, 50 percent of the positions in any firm must always be in the bottom half.

So the only way some workers can enjoy the pleasure inherent in positions of high status is if others are willing to bear the dissatisfactions associated with low status.

The solution, then, is to pay the low-status workers a bit more than they are worth to get them to stay. The high-status workers, in contrast, accept lower pay for the benefit of their lofty positions.

I’m not sure I agree. Yes, I do agree that higher productivity employees are underpaid and lower productivity employees are overpaid. However, I don’t think status fully explains it. There are also issues of variance and correlation and liquidity (there – I’m talking like a real quant now).

One the variance front – the higher you are in the organisation and the higher your salary is, the more the variance of your contribution to the organisation. For example, if you are being paid $350,000 (the number Henderson hypothetically uses), the actual value you are bringing to your firm might have a mean of $500,000 and a standard deviation of $200,000 (pulling all these numbers out of thin air, while making some sense checks that broadly risk pricing holds).

On the other hand, if you are being paid $35,000, then it is far more likely that the average value you bring to the firm is $40,000 with a standard deviation of $5,000 (again numbers entirely pulled out of thin air). Notice the drastic difference in the coefficient of variation in the two cases.

Putting it another way, the more productive you are, the harder it is for any organisation to put a precise value on your contribution. Henderson might say “you are worth 500K while you earn 350K” but the former is an average number. It is because of the high variance in your “worth” that you are paid far lower than what you are worth on average.

And why does this variance exist? It’s due to correlation.

More so at higher ranked positions (as an aside – my weird career path means that I’ve NEVER been in middle management) the value you can add to a company is tightly coupled with your interactions with your colleagues and peers. As a junior employee your role can be defined well enough that your contributions are stable irrespective of how you work with the others. At senior levels though a very large part of the value you can add is tied to how you work with others and leverage their work in your contributions.

So one way a company can get you to contribute more is to have a good set of peers you like working with, which increases your average contribution to the firm. Rather paradoxically, because you like your peers (assuming peer liking in senior management is two way), the company can get away with paying you a little less than your average worth and you will continue to stick on. If you don’t like working with your colleagues, there is the double whammy that you will add less to the company and you need to be paid more to stick on. And so if you look at people who are actually successful in their jobs at a senior level, they will all appear to be underpaid relative to their peers.

And finally there is liquidity (can I ever theorise about something without bringing this up?). The more senior you go, the less liquid is the market for your job. The number of potential jobs that you want to do, and which might want you, is very very low. And as I’ve explained in the first chapter of my book, when a market is illiquid, the bid-ask spread can be rather high. This means that even holding the value of your contribution to a company constant, there can be a large variation in what you are actually paid. And that is a gain why, on average, senior employees are underpaid.

So yes, there is an element of status. But there are also considerations of variance, correlation and bid-ask. And selection bias (senior employees who are overpaid relative to the value they add don’t last very long in their jobs). And this is why, on average, you can afford to underpay senior employees.

Christian Rudder and Corporate Ratings

One of the studdest book chapters I’ve read is from Christian Rudder’s Dataclysm. Rudder is a cofounder of OkCupid, now part of the match.com portfolio of matchmakers. In this book, he has taken insights from OkCupid’s own data to draw insights about human life and behaviour.

It is a typical non-fiction book, with a studmax first chapter, and which gets progressively weaker. And it is the first chapter (which I’ve written about before) that I’m going to talk about here. There is a nice write-up and extract in Maria Popova’s website (which used to be called BrainPickings) here.

Quoting Maria Popova:

What Rudder and his team found was that not all averages are created equal in terms of actual romantic opportunities — greater variance means greater opportunity. Based on the data on heterosexual females, women who were rated average overall but arrived there via polarizing rankings — lots of 1’s, lots of 5’s — got exponentially more messages (“the precursor to outcomes like in-depth conversations, the exchange of contact information, and eventually in-person meetings”) than women whom most men rated a 3.

In one-hit markets like love (you only need to love and be loved by one person to be “successful” in this), high volatility is an asset. It is like option pricing if you think about it – higher volatility means greater chance of being in the money, and that is all you care about here. How deep out of the money you are just doesn’t matter.

I was thinking about this in some random context this morning when I was also thinking of the corporate appraisal process. Now, the difference between dating and appraisals is that on OKCupid you might get several ratings on a 5-point scale, but in your office you only get one rating each year on a 5-point scale. However, if you are a manager, and especially if you are managing a large team, you will GIVE out lots of ratings each year.

And so I was wondering – what does the variance of ratings you give out tell about you as a manager? Assume that HR doesn’t impose any “grading on curve” thing, what does it say if you are a manager who gave out an average rating of 3, with standard deviation 0.5, versus a manager who gave an average of 3, with all employees receiving 1s and 5s.

From a corporate perspective, would you rather want a team full of 3s, or a team with a few 5s and a few 1s (who, it is likely, will leave)? Once again, if you think about it, it depends on your Vega (returns to volatility). In some sense, it depends on whether you are running a stud or a fighter team.

If you are running a fighter team, where there is no real “spectacular performance” but you need your people to grind it out, not make mistakes, pay attention to detail and do their jobs, you want a team full of3s. The 5s in this team don’t contribute that much more than a 3. And 1s can seriously hurt your performance.

On the other hand, if you’re running a stud team, you will want high variance. Because by the sheer nature of work, in a stud team, the 5s will add significantly more value than the 1s might cause damage. When you are running a stud team, a team full of 3s doesn’t work – you are running far below potential in that case.

Assuming that your team has delivered, then maybe the distribution of ratings across the team is a function of whether it does more stud or fighter work? Or am I force fitting my pet theory a bit too much here?

Distribution of political values

Through Baal on Twitter I found this “Political Compass” survey. I took it, and it said this is my “political compass”.

Now, I’m not happy with the result. I mean, I’m okay with the average value where the red dot has been put for me, and I think that represents my political leanings rather well. However, what I’m unhappy about is that my political views have been all reduced to one single average point.

I’m pretty sure that based on all the answers I gave in the survey, my political leaning across both the two directions follows a distribution, and the red dot here is only the average (mean, I guess, but could also be median) value of that distribution.

However, there are many ways in which people can have a political view that lands right on my dot – some people might have a consistent but mild political view in favour of or against a particular position. Others might have pretty extreme views – for example, some of my answers might lead you to believe that I’m an extreme right winger, and others might make me look like a Marxist (I believe I have a pretty high variance on both axes around my average value).

So what I would have liked instead from the political compass was a sort of heat map, or at least two marginal distributions, showing how I’m distributed along the two axes, rather than all my views being reduced to one average value.

A version of this is the main argument of this book I read recently called “The End Of Average“. That when we design for “the average man” or “the average customer”, and do so across several dimensions,  we end up designing for nobody, since nobody is average when looked at on many dimensions.

Dimensional analysis in stochastic finance

Yesterday I was reading through Ole Peters’s lecture notes on ergodicity, a topic that I got interested in thanks to my extensive use of Utility Theory in my work nowadays. And I had a revelation – that in standard stochastic finance, mean returns and standard deviation of returns don’t have the same dimensions. Instead, it’s mean returns and the variance of returns that have the same dimensions.

While this might sound counterintuitive, it is not hard to see if you think about it analytically. We will start with what is possibly the most basic equation in stochastic finance, which is the lognormal random walk model of stock prices.

dS = \mu S dt + \sigma S dW

This can be rewritten as

\frac{dS}{S} = \mu dt + \sigma dW

Now, let us look at dimensions. The LHS divides change in stock price by stock price, and is hence dimensionless. So the RHS needs to be dimensionless as well if the equation is to make sense.

It is easy to see that the first term on the RHS is dimensionless – \mu, the average returns or the drift, is defined as “returns per unit time”. So a stock that returns, on average, 10% in a year returns 20% in two years. So returns has dimensions t^{-1}, and multiplying it with dt which has the unit of time renders it dimensionless.

That leaves us with the last term. dW is the Wiener Process, and is defined such that dW^2 = dt. This implies that dW has the dimensions \sqrt{t}. This means that the equation is meaningful if and only if \sigma has dimensions t^{-\frac{1}{2}}, which is the same as saying that \sigma^2 has dimensions \frac{1}{t}, which is the same as the dimensions of the mean returns.

It is not hard to convince yourself that it makes intuitive sense as well. The basic assumption of a random walk is that the variance grows linearly with time (another way of seeing this is that when you add two uncorrelated random variables, their variances add up to give the variance of the sum). From this again, variance has the units of inverse time – the same as the mean.

Finally, speaking of dimensional analysis and Ole Peters, check out his proof of the Pythagoras Theorem using dimensional analysis.

Isn’t it beautiful?

PS: Speaking of dimensional analysis, check out my recent post on stocks and flows and financial ratios.

 

Standard deviation is over

I first learnt about the concept of Standard Deviation sometime in 1999, when we were being taught introductory statistics in class 12. It was classified under the topic of “measures of dispersion”, and after having learnt the concepts of “mean deviation from median” (and learning that “mean deviation from mean” is identically zero) and “mean absolute deviation”, the teacher slipped in the concept of the standard deviation.

I remember being taught the mnemonic of “railway mail service” to remember that the standard deviation was “root mean square” (RMS! get it?). Calculating the standard deviation was simple. You took the difference between each data point and the average, and then it was “root mean square” – you squared the numbers, took the arithmetic mean and then square root.

Back then, nobody bothered to tell us why the standard deviation was significant. Later in engineering, someone (wrongly) told us that you square the deviations so that you can account for negative numbers (if that were true, the MAD would be equally serviceable). A few years later, learning statistics at business school, we were told (rightly this time) that the standard deviation was significant because it doubly penalized outliers. A few days later, we learnt hypothesis testing, which used the bell curve. “Two standard deviations includes 95% of the data”, we learnt, and blindly applied to all data sets – problems we encountered in examinations only dealt with data sets that were actually normally distributed. It was much later that we figured that the number six in “six sigma” was literally pulled out of thin air, as a dedication to Sigma Six, a precursor of Pink Floyd.

Somewhere along the way, we learnt that the specialty of the normal distribution is that it can be uniquely described by mean and standard deviation. One look at the formula for its PDF tells you why it is so:

Most introductory stats lessons are taught from the point of view of using stats to do science. In the natural world, and in science, a lot of things are normally distributed (hence it is the “normal” distribution). Thus, learning statistics using the normal distribution as a framework is helpful if you seek to use it to do science. The problem arises, however, if you assume that everything is normally distributed, as a lot of people do when they learn deep statistics using the normal distribution.

When you step outside the realms of natural science, however, you are in trouble if you were to blindly use the standard deviation, and consequently, the normal distribution. For in such realms, the normal distribution is seldom normal. Take, for example, stock markets. Most popular financial models assume that the movement of the stock price is either normal or log-normal (the famous Black-Scholes equation uses the latter assumption). In certain regimes, they might be reasonable assumptions, but pretty much anyone who has reasonably followed the markets knows that stock price movements have “fat tails”, and thus the lognormal assumption is not a great example.

At least the stock price movement looks somewhat normal (apart from the fat tails). What if you are doing some social science research and are looking at, for example, data on people’s incomes? Do you think it makes sense at all to define standard deviation for income of a sample of people? Going further, do you think it makes sense at all to compare the dispersion in incomes across two populations by measuring the standard deviations of incomes in each?

I was once talking to an organization which was trying to measure and influence salesperson efficiency. In order to do this, again, they were looking at mean and standard deviation. Given that the sales of one salesperson can be an order of magnitude greater than that of another (given the nature of their product), this made absolutely no sense!

The problem with the emphasis on standard deviation in our education means that most people know one way to measure dispersion. When you know one method to measure something, you are likely to apply it irrespective of whether it is the appropriate method to use given the circumstances. It leads to the proverbial hammer-nail problem.

What we need to understand is that the standard deviation makes sense only for some kinds of data. Yes, it is mathematically defined for any set of numbers, but it makes physical sense only when the data is approximately normally distributed. When data doesn’t fit such a distribution (and more often than not it doesn’t), the standard deviation makes little sense!

For those that noticed, the title of this post is a dedication to Tyler Cowen’s recent book.