Compensation at the right tail

Yesterday I was reading this article ($) about how Liverpool FC is going about (not) retaining its star forwards Sadio Mane and Mo Salah, who have been key parts of the team that has (almost) “cracked it” in the last 5 seasons.

One of the key ideas in the (paywalled) piece is that Liverpool is more careful about spending on its players than other top contemporary clubs. As Oliver Kay writes:

[…] the Spanish club have the financial strength to operate differently — retaining their superstars well into their 30s and paying them accordingly until they are perceived to have served their purpose, at which point either another A-list star or one of the most coveted youngsters in world football (an Eder Militao, an Eduardo Camavinga, a Vinicius Junior, a Rodrygo and perhaps imminently, an Aurelien Tchouameni) will usually emerge to replace them.

In an ideal world, Liverpool would do something similar with Salah and Mane, just as Manchester City did with Vincent Kompany, Fernandinho, Yaya Toure, David Silva and Sergio Aguero — and as they will surely do with De Bruyne.

But the reality is that the Merseyside club are more restricted. Not dramatically so, but restricted enough for Salah, Mane and their agents to know there is more to be earned elsewhere, and that presents a problem not just when it comes to retaining talent but also when it comes to competing for the signings that might fill the footsteps of today’s heroes.

To go back to fundamentals, earnings in sport follow a power law distribution – a small number of elite players make a large portion of the money. And the deal with the power law is that it is self-similar – you can cut off the distribution at any arbitrary amount, and what remains to the right is still a power law.

So income in football follows a power law. Income in elite football also follows the same power law. The English Premier League is at the far right end of this, but wages there again follow a power law. If you look at really elite players in the league, again it is a (sort of – since number of data points would have become small by now) power law.

What this means is that if you can define “marginal returns to additional skill”, at this far right end of the distribution it can be massive. For example, the article talks about how Salah has been offered a 50% hike (to make him the best paid Liverpool player ever), but that is still short of what some other (perceptibly less skilled) footballers earn.

So how do you go about getting value while operating in this kind of a market? One approach, that Liverpool seems to be playing, is to go Moneyball. “The marginal cost of getting a slightly superior player is massive, so we will operate not so far out at the right tail”, seems to be their strategy.

This means not breaking the bank for any particular player. It means ruthlessly assessing each player’s costs and benefits and acting accordingly (though sometimes it comes across as acting without emotion). For example, James Milner has just got an extension in his contract, but at lower wages to reflect his marginally decreased efficiency as he gets older.

Some of the other elite clubs (Real Madrid, PSG, Manchester City, etc.), on the other hand, believe that the premium for marginal quality is worth it, and so splurge on the elite players (including keeping them till fairly late in their careers even if it costs a lot). The rationale here is that the differences (to the “next level”) might be small, but these differences are sufficient to outperform compared to their peers (for example, Manchester City has won the league by one point over Liverpool twice in the last four seasons).

(Liverpool’s moneyball approach, of course, means that they try to get these “marginal advantages” in other (cheaper) ways, like employing a throw in coach or neuroscience consultants).

This approach is not without risk, of course. At the far right end of the tail, the variance in output can be rather high. Because the marginal cost of small increases in competence is so high, even if a player slightly underperforms, the effective monetary value of this underperformance is massive – you have paid for insanely elite players to win you everything, but they win you nothing.

And the consequences can be disastrous, as FC Barcelona found out last year.

In any case, the story doing the rounds now is that Barcelona want to hire Salah, but given their financial situation, they can’t afford to buy out his contract at Liverpool. So, they are hoping that he will run down his contract and join them on a free transfer next year. Then again, that’s what they had hoped from Gini Wijnaldum two years ago as well. And he’s ended up at PSG, where (to the best of my knowledge) he doesn’t play much.

I don’t know which 80%

Legendary retailer John Wanamaker (who pioneered fixed price stores in the mid 1800s) is supposed to have said that “half of all advertising is useless. The trouble is I don’t know which half”.

I was playing around with my twitter archive data, and was looking at the distribution of retweets and favourites across all my tweets. To say that it follows a power law is an understatement.

Before this blog post triggers an automated tweet, I have 63793 tweets, of which 59,275 (93%) have not had a single retweet. 51,717 (81%) have not had a single person liking them. And 50, 165 (79%) of all my tweets have not had a single retweet or a favourite.

In other words, nearly 80% of all my tweets had absolutely no impact on the world. They might as well have not existed. Which means that I should cut down my time spent tweeting down to a fifth. Just that, to paraphrase Wanamaker, I don’t know which four fifths I should eliminate!

There is some good news, though. Over time, the proportion of my tweets that has no impact (in terms of retweets or favourites – the twitter dump doesn’t give me the number of replies to a tweet) has been falling consistently.

Right now, this month, the score is around 33% or so. So even though the proportion of my useless tweets have been dropping over time, even now one in every tweets that I tweet has zero impact.

My “most impactful tweet” itself account for 17% of all retweets that I’ve got. Here I look at what proportion of tweets have accounted for what proportion of “reactions” (reactions for each tweet is defined as the sum of number of retweets and number of favourites. I understand that the same person might have been retweeted and favourited something, but I ignore that bit now).

Notice how extreme the graph is. 0.7% of all my tweets have accounted for 50% of all retweets and likes! 10% of all my tweets have accounted for 90% of all retweets and likes.

Even if I look only at recent data, it doesn’t change shape that much – starting from January 2019, 0.8% of my tweets have accounted for 50% of all retweets and likes.

This, I guess, is the fundamental nature of social media. The impact of a particular tweet follows a power law with a very small exponent (meaning highly unequal).

What this also means is that anyone can go viral. Anyone from go from zero to hero in a single day. It is very hard to predict who is going to be a social media sensation some day.

So it’s okay that 80% of my tweets have no traction. I got one blockbuster, and who knows – I might have another some day. I guess such blockbusters is what we live for.

How power(law)ful is your job?

A long time back I’d written about how different jobs are sigmoidal to different extents – the most fighter jobs, I’d argued, have linear curves – the amount you achieve is proportional to the amount of effort you put in. 

And similarly I’d argued that the studdest jobs have a near vertical line in the middle of the sigmoid – indicating the point when insight happens. 

However what I’d ignored while building that model was that different people can have different working styles – some work like Sri Lanka in 1996 – get off to a blazing start and finish most of the work in the first few days. 

Others work like Pakistan in 1992 – put ned for most of the time and then suddenly finish the job at the last minute. Assuming a sigmoid does injustice to both these strategies since both these curves cannot easily be described using a sigmoidal function. 

So I revise my definition, and in order to do so, I use a concept from the 1992 World Cup – highest scoring overs. Basically take the amount of work you’ve done in each period of time (period can be an hour or day or week or whatever) and sort it in descending order. Take the cumulative sum. 

Now make a plot with an index on the X axis and the cumulative sum on the Y axis. The curve will look like that if a Pareto (80-20) distribution. Now you can estimate the power law exponent, and curves that are steeper in the beginning (greater amount of work done in fewer days) will have a lower power law exponent. 

And this power law exponent can tell you how stud or fighter the job is – the lower the exponent the more stud the job!! 

Gossip Propagation Models

More than ten years ago, back when I was at IIT Madras, I considered myself to be a clearinghouse of gossip. Every evening after dinner I would walk across to Sri Gurunath Patisserie, and plonk myself at one of the tables there with a Rs. 5 Nescafe instant coffee. And there I would meet people. Sometimes we would discuss ideas (while these discussions were rare, they were most fulfilling). Other times we would discuss events. Most of the time, and in conversations that would be entertaining if not fulfilling, we discussed people.

Constant participation in such discussions made sure that any gossip generated anywhere on campus would reach me, and to fill time in subsequent similar conversations I would propagate them. I soon got to know about random details of random people on campus who I hardly cared about. Such information was important purely because someone else might find it interesting. Apart from the joy of learning such gossip, however, I didn’t get remunerated for my services as clearinghouse.

I was thinking about this topic earlier today while reading this studmax post that the wife has written about gossip distribution models. In it she writes:

This confirmed my earlier hypothesis that gossip follows a power law distribution – very few people hold all the enormous hoards of information while the large majority of people have almost negligible information. Gossip primarily follows a hub and spoke model (eg. when someone shares inappropriate pictures of others on a whatsapp group) and in some rare cases especially in private circles (best friends, etc.), it’s point to point.

 

For starters, if you plot the amount of gossip that is propagated by different people (if a particular quantum of gossip is propagated to two different people, we will count it twice), it is very well possible that it follows a power law distribution. This well follows from the now well-known result that degree distribution in real-world social networks follows a power law distribution. On top of this if you assume that some people are much more likely to propagate quantums of gossip they know to other people, and that such propensity for propagation is usually correlated with the person’s “degree” (number of connections), the above result is not hard to show.

The next question is on the way gossip actually propagates. The wife looks at the possibilities through two discrete models – hub-and-spoke and peer-to-peer. In the hub-and-spoke models, gossip is likely to spread along the spokes. Let us assume that the high-degree people are the hubs (intuitive), and according to this model, these people collect gossip from spokes (low degree people) and transmit it to others. In this model, gossip seldom propagates directly between two low-degree people.

At the other end is the peer-to-peer model where the likelihood of gossip spreading along an edge (connection between two people) is independent of the nature of the nodes at the end of the edge. In this kind of a model, gossip is equally likely to flow across any edge. However, if you overlay the (scale free/ power law) network structure over this model, then it will start appearing to be like a hub and spoke model!

In reality, neither of these models is strictly true since we also need to consider each person’s propensity to propagate gossip. There are some people who are extremely “sadhu” and politically correct, who think it is morally wrong to propagate unsubstantiated stories. They are sinks as far as any gossip is concerned. The amount of gossip that reaches them is also lower because their friends know that they’re not interested in either knowing or propagating it. On the other hand you have people (like I used to be) who have a higher propensity of propagating gossip. This also results in their receiving more gossip, and they end up propagating more.

So does gossip propagation follow the hub-and-spoke model or peer-to-peer model? The answer is “somewhere in between”, and a function of the correlation between the likelihood of a node propagating gossip and the degree of the node. If the two are uncorrelated (not unreasonable), then the flow will be closer to peer-to-peer (though degree distribution being a power law makes it appear as if it is hub-and-spoke). If there is very high positive correlation between likelihood of propagation and node degree, the model is very close to hub-and-spoke, since the likelihood of gossip flowing between low degree nodes in such a case is very very low, and thus most of the gossip flow happens through one of the hubs. And if the correlation between likelihood of propagation and node degree is low (negative), then it is likely to lead to a flow that is definitely peer-to-peer.

I plan to set up some simulations to actually study the above possibilities and further model how gossip flows!

Poverty and distributions

No, this post is not about the distribution of poverty. This is a rather technical post about probability distributions. Just that it has something to add to the poverty debate. And like the previous post, this is a departure from the normal RQ-type posts – there will be no graphs, no tables. Just theorizing.

So in the last week or two a lot of op-ed space in India has been consumed by what is described as the “poverty debate”. A recent survey by the National Sample Survey Organization (NSSO) has revealed that poverty levels in India have declined sharply in the last couple of years. And it only accelerates a sharp decline that started after a similar survey in 2004-05. Now, you have the “growthists” and the “distributionists”. The former claim that it is high economic growth in this time period that has led to the fall in poverty. The latter think it is due to redistributionist policies such as the National Rural Employment Guarantee Act (NREGA). Both sides have their merits. However, I’m not going to step into that debate now.

I ask a more fundamental question – how well can we trust the numbers that the NSSO has put out? My concern is this – that the poverty numbers have been gleaned out of a survey. I don’t have a problem with surveying – in fact surveying is a rather well-studied science, and I’m sure people at the NSSO are well-versed with it. My concern is that in this particular survey, the results may not have been properly extrapolated.

Most surveys rely on what is known as the “law of large numbers” and the “central limit theorem” and assume that the quantity being surveyed (people’s consumption expenditure as per this survey) follows a normal distribution. Except that we know that incomes (at least at the upper side of the scale) don’t follow a normal distribution. Instead, it has been shown that they follow what is called as a Power Law distribution.

While I don’t doubt the general quality of scholarship at the NSSO, I want to ask if they have actually studied the real distribution of incomes and used the appropriate one, rather than using a normal distribution. It could be that incomes at the lower end of the scale actually do follow a normal distribution, in which case standard sampling techniques might be used. If not, however, I expect and hope that the NSSO has used a sampling and extrapolation technique appropriate to the distribution incomes actually follow.

Let me illustrate the issue with an extreme example. Let’s say that one of the names drawn as part of the NSSO’s “random sample” for Mumbai is one Mr. Mukesh D Ambani. Assume that there are 99 other persons in Mumbai who are drawn in the same sample, and each of them has an annual household income of Rs. 1 lakh. What will be the mean income of the group? Assuming Mr. Ambani earns Rs. 10 Crore a year (number pulled out of thin air), the mean income of the group of 100 will come out to be close to Rs. 11 lakh!

This is the problem with estimating incomes using surveys and standard extrapolation techniques. While the above example might have been extreme, even in smaller groups of population, there will be “local Mukesh Ambanis” – people whose incomes are much higher than their peer group. Inclusion or exclusion of such people in a standard survey can make a massive difference.

I will end with an example and a request. I remember reading that any family in India that earns over Rs. 12 lakh a year (i.e. Rs. 1 lakh a month) is in the top 1% of all families in India! My family (wife and I) earn more than Rs. 12 lakh. But do we consider ourselves rich? By no means! Why? Because people who are richer than us are much richer than us! That is the problem with quantities that follow a power law distribution.

Now for the request. Can someone instruct me on the easiest way to get the raw data out of the NSSO? Thanks.