Scrabble

I’ve forgotten which stage of lockdown or “unlock” e-commerce for “non-essential goods” reopened, but among the first things we ordered was a Scrabble board. It was an impulse decision. We were on Amazon ordering puzzles for the daughter, and she had just about started putting together “sounds” to make words, so we thought “scrabble tiles might be useful for her to make words with”.

The thing duly arrived two or three days later. The wife had never played Scrabble before, so on the day it arrived I taught her the rules of the game. We play with the Sowpods dictionary open, so we can check words that hte opponent challenges. Our “scrabble vocabulary” has surely improved since the time we started playing (“Qi” is a lifesaver, btw).

I had insisted on ordering the “official Scrabble board” sold by Mattel. The board is excellent. The tiles are excellent. The bag in which the tiles are stored is also excellent. The only problem is that there was no “scoreboard” that arrived in the set.

On the first day we played (when I taught the wife the rules, and she ended up beating me – I’m so horrible at the game), we used a piece of paper to maintain scores. The next day, we decided to score using an Excel sheet. Since then, we’ve continued to use Excel. The scoring format looks somewhat like this.

So each worksheet contains a single day’s play. Initially after we got the board, we played pretty much every day. Sometimes multiple times a day (you might notice that we played 4 games on 3rd June). So far, we’ve played 31 games. I’ve won 19, Priyanka has won 11 and one ended in a tie.

In any case, scoring on Excel has provided an additional advantage – analytics!! I have an R script that I run after every game, that parses the Excel sheet and does some basic analytics on how we play.

For example, on each turn, I make an average of 16.8 points, while Priyanka makes 14.6. Our score distribution makes for interesting viewing. Basically, she follows a “long tail strategy”. Most of the time, she is content with making simple words, but occasionally she produces a blockbuster.

I won’t put a graph here – it’s not clear enough. This table shows how many times we’ve each made more than a particular threshold (in a single turn). The figures are cumulative

Threshold
Karthik
Priyanka
30 50 44
40 12 17
50 5 10
60 3 5
70 2 2
80 0 1
90 0 1
100 0 1

Notice that while I’ve made many more 30+ scores than her, she’s made many more 40+ scores than me. Beyond that, she has crossed every threshold at least as many times as me.

Another piece of analysis is the “score multiple”. This is a measure of “how well we use our letters”. For example, if I start place the word “tiger” on a double word score (and no double or triple letter score), I get 12 points. The points total on the tiles is 6, giving me a multiple of 2.

Over the games I have found that I have a multiple of 1.75, while she has a multiple of 1.70. So I “utilise” the tiles that I have (and the ones on the board) a wee bit “better” than her, though she often accuses me of “over optimising”.

It’s been fun so far. There was a period of time when we were addicted to the game, and we still turn to it when one of us is in a “work rut”. And thanks to maintaining scores on Excel, the analytics after is also fun.

I’m pretty sure you’re spending the lockdown playing some board game as well. I strongly urge you to use Excel (or equivalent) to maintain scores. The analytics provides a very strong collateral benefit.

 

This year on Spotify

I’m rather disappointed with my end-of-year Spotify report this year. I mean, I know it’s automated analytics, and no human has really verified it, etc.  but there are some basics that the algorithm failed to cover.

The first few slides of my “annual report” told me that my listening changed by seasons. That in January to March, my favourite artists were Black Sabbath and Pink Floyd, and from April to June they were Becky Hill and Meduza. And that from July onwards it was Sigala.

Now, there was a life-changing event that happened in late March which Spotify knows about, but failed to acknowledge in the report – I moved from the UK to India. And in India, Spotify’s inventory is far smaller than it is in the UK. So some of the bands I used to listen to heavily in the UK, like Black Sabbath, went off my playlist in India. My daughter’s lullaby playlist, which is the most consumed music for me, moved from Spotify to Amazon Music (and more recently to Apple Music).

The other thing with my Spotify use-case is that it’s not just me who listens to it. I share the account with my wife and daughter, and while I know that Spotify has an algorithm for filtering out kid stuff, I’m surprised it didn’t figure out that two people are sharing this account (and pitched us a family subscription).

According to the report, these are the most listened to genres in 2019:

Now there are two clear classes of genres here. I’m surprised that Spotify failed to pick it out. Moreover, the devices associated with my account that play Rock or Power Metal are disjoint from the devices that play Pop, EDM or House. It’s almost like Spotify didn’t want to admit that people share accounts.

Then some three slides on my podcast listening for the year, when I’ve overall listened to five hours of podcasts using Spotify. If I, a human, were building this report, I would have dropped this section citing insufficient data, rather than wasting three slides with analytics that simply don’t make sense.

I see the importance of this segment in Spotify’s report, since they want to focus more on podcasts (being an “audio company” rather than a “music company”), but maybe something in the report to encourage me to use Spotify for more podcasts (maybe recommending Spotify’s exclusive podcasts that I might like, be it based on limited data?) might have helped.

Finally, take a look at my our most played songs in 2019.

It looks like my daughter’s sleeping playlist threaded with my wife’s favourite songs (after a point the latter dominate). “My songs” are nowhere to be found – I have to go all the way down to number 23 to find Judas Priest’s cover of Diamonds and Rust. I mean I know I’ve been diversifying the kind of music that I listen to, while my wife listens to pretty much the same stuff over and over again!

In any case, automated analytics is all fine, but there are some not-so-edge cases where the reports that it generates is obviously bad. Hopefully the people at Spotify will figure this out and use more intelligence in producing next year’s report!

EPL: Mid-Season Review

Going into the November international break, Liverpool are eight points ahead at the top of the Premier League. Defending champions Manchester City have slipped to fourth place following their loss to Liverpool. The question most commentators are asking is if Liverpool can hold on to this lead.

We are two-thirds of the way through the first round robin of the premier league. The thing with evaluating league standings midway through the round robin is that it doesn’t account for the fixture list. For example, Liverpool have finished playing the rest of the “big six” (or seven, if you include Leicester), but Manchester City have many games to go among the top teams.

So my practice over the years has been to compare team performance to corresponding fixtures in the previous season, and to look at the points difference. Then, assuming the rest of the season goes just like last year, we can project who is likely to end up where.

Now, relegation and promotion introduces a source of complication, but we can “solve” that by replacing last season’s relegated teams with this season’s promoted teams (18th by Championship winners, 19th by Championship runners-up, and 20th by Championship playoff winners).

It’s not the first time I’m doing this analysis. I’d done it once in 2013-14, and once in 2014-15. You will notice that the graphs look similar as well – that’s how lazy I am.

Anyways, this is the points differential thus far compared to corresponding fixtures of last season. 

 

 

 

Leicester are the most improved team from last season, having scored 8 points more than in corresponding fixtures from last season. Sheffield United, albeit starting from a low base, have done extremely well as well. And last season’s runners-up Liverpool are on a plus 6.

The team that has done worst relative to last season is Tottenham Hotspur, at minus 13. Key players entering the final years of their contract and not signing extensions, and scanty recruitment over the last 2-3 years, haven’t helped. And then there is Manchester City at minus 9!

So assuming the rest of the season’s fixtures go according to last season’s corresponding fixtures, what will the final table look  like at the end of the season?
We see that if Liverpool replicate their results from last season for the rest of the fixtures, they should win the league comfortably.

What is more interesting is the gaps between 1-2, 2-3 and 3-4. Each of the top three positions is likely to be decided “comfortably”, with a fairly congested mid-table.

As mentioned earlier, this kind of analysis is unfair to the promoted teams. It is highly unlikely that Sheffield will get relegated based on the start they’ve had.

We’ll repeat this analysis after a couple of months to see where the league stands!

Periodicals and Dashboards

The purpose of a dashboard is to give you a live view of what is happening with the system. Take for example the instrument it is named after – the car dashboard. It tells you at the moment what the speed of the car is, along with other indicators such as which lights are on, the engine temperature, fuel levels, etc.

Not all reports, however, need to be dashboards. Some reports can be periodicals. These periodicals don’t tell you what’s happening at a moment, but give you a view of what happened in or at the end of a certain period. Think, for example, of classic periodicals such as newspapers or magazines, in contrast to online newspapers or magazines.

Periodicals tell you the state of a system at a certain point in time, and also give information of what happened to the system in the preceding time. So the financial daily, for example, tells you what the stock market closed at the previous day, and how the market had moved in the preceding day, month, year, etc.

Doing away with metaphors, business reporting can be classified into periodicals and dashboards. And they work exactly like their metaphorical counterparts. Periodical reports are produced periodically and tell you what happened in a certain period or point of time in the past. A good example are company financials – they produce an income statement and balance sheet to respectively describe what happened in a period and at a point in time for the company.

Once a periodical is produced, it is frozen in time for posterity. Another edition will be produced at the end of the next period, but it is a new edition. It adds to the earlier periodical rather than replacing it. Periodicals thus have historical value and because they are preserved they need to be designed more carefully.

Dashboards on the other hand are fleeting, and not usually preserved for posterity. They are on the other hand overwritten. So whether all systems are up this minute doesn’t matter a minute later if you haven’t reacted to the report this minute, and thus ceases to be of importance the next minute (of course there might be some aspects that might be important at the later date, and they will be captured in the next periodical).

When we are designing business reports and other “business intelligence systems” we need to be cognisant of whether we are producing a dashboard or a periodical. The fashion nowadays is to produce everything as a dashboard, perhaps because there are popular dashboarding tools available.

However, dashboards are expensive. For one, they need a constant connection to be maintained to the “system” (database or data warehouse or data lake or whatever other storage unit in the business report sense). Also, by definition they are not stored, and if you need to store then you have to decide upon a frequency of storage which makes it a periodical anyway.

So companies can save significantly on resources (compute and storage) by switching from dashboards (which everyone seems to think in terms of) to periodicals. The key here is to get the frequency of the periodical right – too frequent and people will get bugged. Not frequent enough, and people will get bugged again due to lack of information. Given the tools and technologies at hand, we can even make reports “on demand” (for stuff not used by too many people).

Vlogging!

The first seed was sown in my head by Harish “the Psycho” J, who told me a few months back that nobody reads blogs any more, and I should start making “analytics videos” to increase my reach and hopefully hit a new kind of audience with my work.

While the idea was great, I wasn’t sure for a long time what videos I could make. After all, I’m not the most technical guy around, and I had no patience for making videos on “how to use regression” and stuff like that. I needed a topic that would be both potentially catchy and something where I could add value. So the idea remained an idea.

For the last four or five years, my most common lunchtime activity has been to watch chess videos. I subscribe to the Youtube channels of Daniel King and Agadmator, and most days when I eat lunch alone at home are spent watching their analyses of games. Usually this routine gets disrupted on Fridays when the wife works from home (she positively hates these videos), but one Friday a couple of months back I decided to ignore her anyway and watch the videos (she was in her room working).

She had come out to serve herself to another serving of whatever she had made that day and saw me watching the videos. And suddenly asked me why I couldn’t make such videos as well. She has seen me work over the last seven years to build what I think is a fairly cool cricket visualisation, and said that I should use it to make little videos analysing cricket matches.

And since then my constant “background process” has been to prepare for these videos. Earlier, Stephen Rushe of Cricsheet used to unfailingly upload ball by ball data of all cricket matches as soon as they were done. However, two years back he went into “maintenance mode” and has stopped updating the data. And so I needed a method to get data as well.

Here, I must acknowledge the contributions of Joe Harris of White Ball Analytics, who not only showed me the APIs to get ball by ball data of cricket matches, but also gave very helpful inputs on how to make the visualisation more intuitive, and palatable to the normal cricket fan who hasn’t seen such a thing before. Joe has his own win probability model based on ball by ball data, which I think is possibly superior to mine in a lot of scenarios (my model does badly in high-scoring run chases), though I’ve continued to use my own model.

So finally the data is ready, and I have a much improved visualisation to what I had during the IPL last year, and I’ve created what I think is a nice app using the Shiny package that you can check out for yourself here. This covers all T20 international games, and you can use the app to see the “story of each game”.

And this is where the vlogging comes in – in order to explain how the model works and how to use it, I’ve created a short video. You can watch it here:

While I still have a long way to go in terms of my delivery, you can see that the video has come out rather well. There are no sync issues, and you see my face also in one corner. This was possible due to my school friend Sunil Kowlgi‘s Outklip app. It’s a pretty easy to use Chrome app, and the videos are immediately available on the platform. There is quick YouTube integration as well, for you to upload them.

And this is not a one time effort – going forward I’ll be making videos of limited overs games analysing them using my app, and posting them on my Youtube channel (or maybe I’ll make a new channel for these videos. I’ll keep you updated). I hope to become a regular Vlogger!

So in the meantime, watch the above video. And give my app a spin. Soon I’ll be releasing versions covering One Day Internationals and franchise T20s as well.

 

Programming Languages

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter.

About a decade ago, I used to make fun of information technology company that hired developers based on the language they coded in. My contention was that writing code is a skill that you either have or you don’t, and what a potential employer needs to look for is the ability to think algorithmically, and then render ideas in code. 

While I’ve never worked as a software engineer I find myself writing more and more code over the years as a part of doing data analysis. The primary tool I use is R, where coding doesn’t really feel like coding, since it is a rather high level language. However, I’m occasionally asked to show code in Python, since some clients are more proficient in that, and the one thing that has done is to teach me the value of domain knowledge of a programming language. 

I take this opportunity to apologise for my prior belief that all that matters is thinking algorithmically, and language in which the ideas are expressed doesn’t matter. 

This is because the language you usually program in subtly nudges you towards thinking in a particular way. Having mostly used R over the last decade, I think in terms of tables and data frames, and after having learnt tidyverse earlier this year, my way of thinking algorithmically has become in a weird way “object oriented” (no, this has nothing to do with classes). I take an “object” (a data frame) and then manipulate it in various ways, changing it, summarising stuff, calculating things on the fly and aggregating, until the point where the result comes out in an elegant manner. 

And while Pandas allows chaining (in fact, it is from Pandas that I suspect the tidyverse guys got the idea for the “%>%” chaining operator), it is by no means as complete in its treatment of chaining as R, and that that makes things tricky. 

Moreover, being proficient in R makes you think in terms of vectorised operations, and when you see that python doesn’t necessarily offer that, and and operations that were once simple in R are now rather complicated in Python, using list comprehension and what not. 

Putting it another way, thinking algorithmically in the framework offered by one programming language makes it rather stressful to express these thoughts in another language where the way of algorithmic thinking is rather different. 

For example, I’ve never got the point of the index in pandas dataframes, and I only find myself “resetting” it constantly so that my way of addressing isn’t mangled. Compared to the intuitive syntax in R, which is first and foremost a data analysis tool, and where the data frame is “native”, the programming language approach of python with its locs and ilocs is again irritating. 

I can go on… 

And I’m guessing this feeling is mutual – someone used to doing things the python way would find R’s syntax and way of doing things rather irritating. R’s machine learning toolkit for example is nowhere as easy as scikit learn is in python (this doesn’t affect me since I seldom need to use machine learning. For example, I use regression less than 5% of the time in my work). 

The next time I see a job opening for a “java developer” I will not laugh like I used to ten years ago. I know that this posting is looking for a developer who can not only think algorithmically, but also algorithmically in the way that is most convenient to express in Java. And unlearning one way of algorithmic thinking and learning another isn’t particularly easy. 

Analytics for general managers

While good managers have always been required to be analytical, the level of analytical ability being asked of managers has been going up over the years, with the increase in availability of data.

Now, this post is once again based on that one single and familiar data point – my wife. In fact, if you want me to include more data in my posts, you should talk to me more.

Leaving that aside, my wife works as a mid-level manager for an extremely large global firm. She was recruited straight out of business school for a “MBA track” program. And from our discussions about her work in the first few months, one thing she did lots of was writing SQL queries. And she still spends a lot of her time writing queries and building Excel models.

This isn’t something she was trained for, or was tested on while being recruited. She did her MBA in a famously diverse global business school, the diversity of its student bodies implying the level of maths and quantitative methods being kept rather low. She was recruited as a “general manager”. Yet, in a famously data-driven company, she spends a considerable amount of time on quantitative stuff.

It wasn’t always like this. While analytical ability has what (in my opinion) set apart graduates of elite MBA programs from those of middling MBA programs, the level of quantitative ability expected out of MBAs (apart from maybe those in finance) wasn’t too high. You were expected to know to use spreadsheets. You were expected to know some rudimentary statistics- means and standard deviations and some basic hypothesis testing, maybe. And you were expected to be able to make managerial decisions based on numbers. That’s about it.

Over the years, though, as the corpus of data within (and outside) organisations has grown, and making decisions based on data has become fashionable (a brilliant thing as far as I’m concerned), the requirement from managers has grown as well. Now they are expected to do more with data, and aren’t always trained for that.

Some organisations have responded to this problem by supplying “data analysts” who are attached to mid level managers, so that the latter can outsource the analytical work to the former and spend most of their time on “managerial” stuff. The problem with this is twofold – it is hard to guarantee a good career path to this data analyst (which makes recruitment hard), and this introduces “friction” – the manager needs to tell the analyst what precise data and analysis she needs, and iterating on this can lead to a lot of time lost.

Moreover, as the size of the data has grown, the complexity of the analysis that can be done and the insights that can be produced has become greater as well. And in that sense, managers who have been able to adapt to the volume and complexity of data have a significant competitive advantage over their peers who are less comfortable with data.

So what does all this mean for general managers and their education? First, I would expect the smarter managers to know that data analysis ability is a competitive advantage, and so invest time in building that skill. Second, I know of some business schools that are making their MBA programs less quantitative, as their student body becomes more diverse and the recruitment body becomes less diverse (banks are recruiting far less nowadays). This is a bad move. In fact, business schools need to realise that a quantitative MBA program is more of a competitive advantage nowadays, and tune their programs accordingly, while not compromising on the diversity of the student intake.

Then, there is a generation of managers that got along quite well without getting its hands dirty with data. These managers will now get challenged by younger managers who are more conversant with data. It will be interesting to see how organisations deal with this dynamic.

Finally, organisations need to invest in training programs, to make sure that their general managers are comfortable with data, and analysis, and making use of internal and external data science resources. Interestingly enough (I promise I hadn’t thought of this when I started writing this post), my company offers precisely one such workshop. Get in touch if you’re interested!

The missing middle in data science

Over a year back, when I had just moved to London and was job-hunting, I was getting frustrated by the fact that potential employers didn’t recognise my combination of skills of wrangling data and analysing businesses. A few saw me purely as a business guy, and most saw me purely as a data guy, trying to slot me into machine learning roles I was thoroughly unsuited for.

Around this time, I happened to mention to my wife about this lack of fit, and she had then remarked that the reason companies either want pure business people or pure data people is that you can’t scale a business with people with a unique combination of skills. “There are possibly very few people with your combination of skills”, she had said, and hence companies had gotten around the problem by getting some very good business people and some very good data people, and hope that they can add value together.

More recently, I was talking to her about some of the problems that she was dealing with at work, and recognised one of them as being similar to what I had solved for a client a few years ago. I quickly took her through the fundamentals of K-means clustering, and showed her how to implement it in R (and in the process, taught her the basics of R). As it had with my client many years ago, clustering did its magic, and the results were literally there to see, the business problem solved. My wife, however, was unimpressed. “This requires too much analytical work on my part”, she said, adding that “If I have to do with this level of analytical work, I won’t have enough time to execute my managerial duties”.

This made me think about the (yet unanswered) question of who should be solving this kind of a problem – to take a business problem, recognise it can be solved using data, figuring out the right technique to apply to it, and then communicating the results in a way that the business can easily understand. And this was a one-time problem, not something you would need to solve repeatedly, and so without the requirement to set up a pipeline and data engineering and IT infrastructure around it.

I admit this is just one data point (my wife), but based on observations from elsewhere, managers are usually loathe to get their hands dirty with data, beyond perhaps doing some basic MS Excel work. Data science specialists, on the other hand, will find it hard to quickly get intuition for a one-time problem, get data in a “dirty” manner, and then apply the right technique to solving it, and communicate the results in a business-friendly manner. Moreover, data scientists are highly likely to be involved in regular repeatable activities, making it an organisational nightmare to “lease” them for such one-time efforts.

This is what I call as the “missing middle problem” in data science. Problems whose solutions will without doubt add value to the business, but which most businesses are unable to address because of a lack of adequate skillset in solving the issue; and whose one-time nature makes it difficult for businesses to dedicate permanent resources to solve.

I guess so far this post has all the makings of a sales pitch, so let me turn it into one – this is precisely the kind of problem that my company Bespoke Data Insights is geared to solving. We specialise in solving problems that lie at the cusp of business and data. We provide end-to-end quantitative solutions for typically one-time business problems.

We come in, understand your business needs, and use a hypothesis-driven approach to model the problem in data terms. We select methods that in our opinion are best suited for the precise problem, not hesitating to build our own models if necessary (hence the Bespoke in the name). And finally, we synthesise the analysis in the form of recommendations that any business person can easily digest and action on.

So – if you’re facing a business problem where you think data might help, but don’t know how to proceed; or if you are curious about all this talk about AI and ML and data science and all that, and want to include it in your business; or you want your business managers to figure out how to use the data  teams better, hire us.

Why data scientists should be comfortable with MS Excel

Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable tasks.

In fact, some data scientists may consider Excel to be “too downmarket” for them to use. At one firm I worked for, I had heard a rumour that using Excel for modelling was a fire-able offence, though I’m glad to report that I flouted this rule without much adverse effect. Yet, in my years as a “data science” and analytics consultant, and having done several modelling jobs before, I think Excel is an extremely necessary tool in a data scientist’s arsenal. There are several reasons for this.

The main one is communication. “Business types” love Excel – they use it for pretty much every official activity (I know of people who write documents in Excel). If you ask for a set of numbers, you are most likely to find it in an Excel sheet. I know of fairly large organisations which use Excel to store and transmit data (admittedly poor usage). And even non-quantitaive business types understand some of the basic quantitative functions thanks to Excel, such as joining (VLookup), pivoting, basic data cleaning (TRIM, VALUE, etc.), averaging, visualisation and sometimes even basic statistics such as correlation and regression.

One of the main problems that organisations face is lack of communication between data scientists and the business side (I mentioned this in a talk I gave last month: video here and slides here). Excel is an excellent middle ground, since it is reasonably quantitative and business people know how to use it.

In fact, in my consulting experience I’ve found that when working with clients, using Excel can make your client (usually a business person) feel more comfortable and involved in the analysis, speeding up the process and significantly improving collaboration. They’ll feel more empowered to intervene, which means they can add value, and they can feel especially happy if you occasionally let them enter some simple quantitative formulae.

The next advantage of Excel is that it puts the numbers out there. A long time back, when I was still doing full time jobs, I was asked to build a forecasting model (using a programming language) and couldn’t get it right for several months. And then on a whim I decided to use Excel, and when I saw the data in front of me, it was clear why the forecasts were so useless – because the data was so random.

Excel also allows you to quickly try things and iterate, again by putting the data and the analysis in front of you. Admittedly, the toolkit available is limited compared to what programming languages or statistical softwares can offer, but through clever usage (especially with Visual Basic), there is a lot you can achieve.

Then, Excel sometimes nudges you towards finding simple solutions. It is possible when you’re using a programming language to veer towards overly complicated solutions, and possibly use the proverbial nuclear weapon against the sparrow.

When I was working on the forecasting work a decade ago, I found that the forecasts would feed into a fairly complicated-looking model that had been developed over several years by several developers. On a whim, I decided to “do more” in Excel and managed to replicate the entire model in Excel (using VB and Solver). The people leading the product weren’t particularly happy, but using Excel was critical in ultimately moving to a simpler solution.

A similar thing occurred recently as well. I had been building a fairly complex optimisation model, which I tried replicating in Excel for communication purposes (so I could work on it together with the client). And it turned out there was a far simpler solution that I had missed all this time, and the simpler solution became apparent only because I used Excel.

I’m sure this is not an exhaustive list. So, if you’re a data scientist, you will do well to be at least conversant with Excel. I know it may only serve limited needs in terms of analysis, but the effort in learning  will get more than compensated for in the communication and collaboration and simplicity.

Tailpiece:
A long time ago, a co-worker passed by my desk and saw me work on Excel. He saw my spreadsheet and remarked, “oh, so many numbers! it must be very complicated” and went on his way. I don’t know if he is a data scientist now.

Stocks and flows

One common mistake even a lot of experienced analysts make is comparing stocks to flows. Recently, for example, Apple’s trillion dollar valuation was compared to countries’ GDP. A few years back, an article compared the quantum of bad loans in Indian banks to the country’s GDP. Following an IPL auction a few years back, a newspaper compared the salary of a player the market cap of some companies (paywalled).

The simplest way to reason why these comparisons don’t make sense is that they are comparing variables that have different dimensionality. Stock variables are usually measured in dollars (or pounds or euros or whatever), while flows are usually measured in terms of currency per unit time (dollars per year, for example).

So to take some simple examples, your salary might be $100,000 per year. The current value of your stock portfolio might be $10,246. India’s GDP is 2 trillion dollars per year.  Liverpool FC paid £67 million to buy out Alisson’s contract at AS Roma, and will pay him a salary of about £77,000 per week. Apple’s market capitalisation is 1.05 trillion dollars, and its sales as per the latest financials is 229 billion dollars per year.

Get the drift? The simplest way to avoid confusing stocks and flows is to be explicit about the dimensionality of the quantity being compared – flows have a “per unit time” suffixed to their dimensions.

Following the news of Apple’s market cap hitting a trillion dollars, I put out a tweet about the fallacy of comparing it to the GDP of the United States.

A lot of the questions that followed came from stock market analysts, who are used to looking at companies in terms of financial ratios, most of which involve both stocks and flows. They argued that because these ratios are well-established, it is legitimate to compare stocks to flows.

For example, we get the Price to Earnings ratio by dividing a company’s stock price (a stock) by the company’s annual earnings per share (a flow). The asset turnover ratio is derived by dividing the annual revenues (a flow) by the amount of assets (a stock). In fact, barring simple ratios such as gross margin, most ratios in financial analysis involve dividing a stock by a flow or the other way round.

To put it simply, financial ratios are not a case of comparing stocks to flows because ratios by themselves don’t mean a thing, and their meaning is derived from comparing them to similar ratios from other companies or geographies or other points in time.

A price to earnings ratio is simply the ratio of price per share to (annual) earnings per share, and has the dimension of “years”. When we compute the P/E ratio, we are not comparing price to earnings, since that would be nonsensical (they have different dimensions). We are dividing one by the other and comparing the ratio itself to historic or global benchmarks.

The reason a company with a P/E ratio of 25 (for example) is seen as being overvalued is because this value lies at the upper end of the distribution of historical P/E ratios. So we are comparing one ratio to the other (with both having the same dimension).

In conclusion, when you take the ratio of one quantity to another, you are just computing a new quantity – you are not comparing the numerator to the denominator. And when you compare quantities, always make sure that you are being dimensionally consistent.