Communicating Numbers

Earlier this week I read this masterful blogpost on Andrew Gelman’s blog (though the post itself is not written by Andrew Gelman – it’s written by Phil Price) about communicating numbers.

Basically the way you communicate a number can give a lot more information “between the lines”. Take the example at the top of the article:

“At the New York Marathon, three of the five fastest runners were wearing our shoes.” I’m sure I’m not the first or last person to have realized that there’s more information there than it seems at first. For one thing, you can be sure that one of those three runners finished fifth: otherwise the ad would have said “three of the four fastest.” Also, it seems almost certain that the two fastest runners were not wearing the shoes, and indeed it probably wasn’t 1-3 or 2-3 either: “The two fastest” and “two of the three fastest” both seem better than “three of the top five.” The principle here is that if you’re trying to make the result sound as impressive as possible, an unintended consequence is that you’re revealing the upper limit.

Incredible. So 3 in 5 means one of them is likely to be 5th. And likely one is fourth as well. Similarly, if you see a company that calls itself a “Fortune 500 company”, it is likely closer to 500 than to 100.

The other, slightly unrelated, example quoted in the article is about Covid-19 spread in outdoor conditions. There is another article that says that “less than 10% of covid-19 transmission that happens indoors”. This is misleading because if you say “less than 10%”, people will assume it’s 9%! The number, apparently, is closer to 0.1%.

There are many more such examples that we encounter in real life. If you write on LinkedIn that you went to a “top 10 ranked B-school”, it means you DID NOT go to a “top 5 ranked B-school”.

Loosely related to this, I’ve got a bit irritated over the last year and a bit in terms of imprecise numerical reporting by the media (related to covid-19). I won’t provide links or quotes here, since what I can remember are mostly by one person and I don’t want to implicate her here (and it’s a systemic problem, not unique to her).

You see reports saying “20000 new cases in Karnataka. A majority of them are from Bangalore”. I’ve seen this kind of a report even when 90% of the cases have been from Bangalore, and that is misleading – when you say “majority”, you instinctively think of “50% + 1”. Another report said “as many as 10000 cases”. Now, the “as many as” phrasing makes it sound like a very large number, but put in context, this 10000 wasn’t really very high.

Communication of numbers is an art that is not very well spread. Nowadays we see lots of courses on “telling stories with data”, “data visualisation”, graphics, etc. but none in terms of communication of sheer numbers itself.

Maybe I should record an episode about this in my forthcoming podcast. If you know who might be a good guest for it, AND can make an introduction, let me know.

27% and building narratives using numbers

Some numbers scare you. Some numbers look so unreasonably large that it seems daunting to you, infeasible even. Other numbers, when wrapped in the right kind of narrative, seem so unreasonably small that they sway you (the Rs. 32 per person per day poverty line comes to mind). Thus, when you are dealing with numbers that intuitively look very large or very small, it is important that you build the right narrative around them. Wrap them well so that it doesn’t scare or haunt people. As the old Mirinda Lime ad used to say, “zor ka jhatka.. dheere se lage..”.

So the number in the headline of this blog post is the proposed rate of the Goods and Service Tax. While it is the revenue-neutral amount that needs to be charge should excise and sales and other taxes go, the number looks stupendously large. The way this number was reported on the front pages of business newspapers this morning, it looks so large and out of whack that people might decide that it is better to not have a GST at all.

I’m not blaming the papers for this – they have reported what they’ve been told. It is a question of building narratives by the government. The government, and the GST sub-panel, has done a lousy job of communicating this number, and guiding how it needs to be reported in the media. It is almost as if the way the number was reported is an attempt to further delay the implementation of the GST.

The GST is too important a piece of legislation to be derailed by bad narratives. The government must make every attempt to build a narrative that shows the GST as being conducive to people and to businesses, to show how the transaction costs it reduces will result in better prices for both consumers and businesses, and why it makes lives better. Reporting numbers that look really large doesn’t help matters.

Also, the quant in me is disappointed to see one precise number being put out as the “revenue neutral rate”. Since different goods and services which are now being taxed at differential rates are going to be brought into this one umbrella rate, the real revenue neutral rate is actually a function of the mix of the contribution of each of these goods and services to the GDP. Given that in a dynamic economy these rates are constantly changing, reporting one revenue neutral rate simply doesn’t make sense. A range would be a better way of going about it.

Related to this, given that the revenue neutral rate is a function of mix of goods and services, and this mix will change over time, the assumptions and forecasts that need to be taken into account in the process of fixing the rate are important. The GST panel would do well to take into account the risk of product-and-service mix changing that can make all calculations go awry!

PS: If only they were to hire me as a consultant to this panel 😛


Should you have an analytics team?

In an earlier post, I had talked about the importance of business people knowing numbers and numbers people knowing business, and had put in a small advertisement for my consulting services by mentioning that I know both business and numbers and work at their cusp. In this post, I take that further and analyze if it makes sense to have a dedicated analytics team.

Following the data boom, most companies have decided (rightly) that they need to do something to take advantage of all the data that they have and have created dedicated analytics teams. These teams, normally staffed with people from a quantitative or statistical background, with perhaps a few MBAs, is in charge of taking care of all the data the company has along with doing some rudimentary analysis. The question is if having such dedicated teams is effective or if it is better to have numbers-enabled people across the firm.

Having an analytics team makes sense from the point of view of economies of scale. People who are conversant with numbers are hard to come by, and when you find some, it makes sense to put them together and get them to work exclusively on numerical problems. That also ensures collaboration and knowledge sharing and that can have positive externalities.

Then, there is the data aspect. Anyone doing business analytics within a firm needs access to data from all over the firm, and if the firm doesn’t have a centralized data warehouse which houses all its data, one task of each analytics person would be to get together the data that they need for their analysis. Here again, the economies of scale of having an integrated analytics team work. The job of putting together data from multiple parts of the firm is not solved multiple times, and thus the analysts can spend more time on analyzing rather than collecting data.

So far so good. However, writing a while back I had explained that investment banks’ policies of having exclusive quant teams have doomed them to long-term failure. My contention there (including an insider view) was that an exclusive quant team whose only job is to model and which doesn’t have a view of the market can quickly get insular, and can lead to groupthink. People are more likely to solve for problems as defined by their models rather than problems posed by the market. This, I had mentioned can soon lead to a disconnect between the bank’s models and the markets, and ultimately lead to trading losses.

Extending that argument, it works the same way with non-banking firms as well. When you put together a group of numbers people and call them the analytics group, and only give them the job of building models rather than looking at actual business issues, they are likely to get similarly insular and opaque. While initially they might do well, soon they start getting disconnected from the actual business the firm is doing, and soon fall in love with their models. Soon, like the quants at big investment banks, they too will start solving for their models rather than for the actual business, and that prevents the rest of the firm from getting the best out of them.

Then there is the jargon. You say “I fitted a multinomial logistic regression and it gave me a p-value of 0.05 so this model is correct”, the business manager without much clue of numbers can be bulldozed into submission. By talking a language which most of the firm understands you are obscuring yourself, which leads to two responses from the rest. Either they deem the analytics team to be incapable (since they fail to talk the language of business, in which case the purpose of existence of the analytics team may be lost), or they assume the analytics team to be fundamentally superior (thanks to the obscurity in the language), in which case there is the risk of incorrect and possibly inappropriate models being adopted.

I can think of several solutions for this – but irrespective of what solution you ultimately adopt –  whether you go completely centralized or completely distributed or a hybrid like above – the key step in getting the best out of your analytics is to have your senior and senior-middle management team conversant with numbers. By that I don’t mean that they all go for a course in statistics. What I mean is that your middle and senior management should know how to solve problems using numbers. When they see data, they should have the ability to ask the right kind of questions. Irrespective of how the analytics team is placed, as long as you ask them the right kind of questions, you are likely to benefit from their work (assuming basic levels of competence of course). This way, they can remain conversant with the analytics people, and a middle ground can be established so that insights from numbers can actually flow into business.

So here is the plug for this post – shortly I’ll be launching short (1-day) workshops for middle and senior level managers in analytics. Keep watching this space :)