FaceTime Baby

My nephew Samvit, born in 2011, doesn’t talk much on the phone. It’s possibly because he didn’t talk much on the phone as a baby, but I’ve never been able to have a decent phone conversation with him (we get along really well when we meet, though). He talks a couple of lines and hands over the phone to his mother and runs off. If it’s a video call, he appears, says hi and disappears.

Berry (born in 2016), on the other hand, seems to have in a way “leapfrogged” the phone. We moved to London when she was five and a half months old, and since then we’ve kept in touch with my in-laws and other relatives primarily through video chat (FaceTime etc.). And so Berry has gotten used to seeing all these people on video, and has become extremely comfortable with the medium.

For example, when we were returning from our last Bangalore trip in December, we were worried that Berry would miss her grandparents tremendously. As it turned out, we landed in London and video called my in-laws, and Berry was babbling away as if there was no change in scene!

Berry has gotten so used to video calling that she doesn’t seem to get the “normal” voice call. Sure enough, she loves picking up the phone and holding it against her ear and saying “hello” and making pretend conversations (apparently she learnt this at her day care). But give her a phone and ask her to talk, and she goes quiet unless there’s another person appearing on screen.

Like there’s this one aunt of mine who is so tech-phobic that she doesn’t use video calls. And every time I call her she wants to hear Berry speak, except that Berry won’t speak because there is nobody on the screen! I’m now trying to figure out how to get this aunt to get comfortable with video calling just so that Berry can talk to her!

 

In that sense, Berry is a “video call” native. And I wouldn’t be surprised if it turns out that she’ll find it really hard to get comfortable with audio calls later on in life.

I’ll turn into one uncle now and say “kids nowadays… “

Religion and survivorship bias

Biju Dominic of FinalMile Consulting has a piece in Mint about “what CEOs can learn from religion“. In that, he says,

Despite all the hype, the vast majority of these so-called highly successful, worthy of being emulated companies, do not survive even for few decades. On the other hand, religion, with all its inadequacies, continues to survive after thousands of years.

This is a fallacious comparison.

Firstly, comparing “religion” to a particular company isn’t dimensionally consistent. A better comparison would be to compare at the conceptual level – such as comparing “religion” to “joint stock company”. And like the former, the latter has done rather well for 300 years now, even if specific companies may fold up after a few years.

The other way to make an apples-to-apples comparison is to compare a particular company to a particular religion. And this is where survivorship bias comes in.

Most of the dominant religions of today are more than hundreds or thousands of years old. In the course of their journey to present-day strength, they have first established their own base and then fought off competition from other upstart religions.

In other words, when Dominic talks about “religion” he is only taking into account religions that have displayed memetic fitness over a really long period. What he fails to take account of are the thousands of startup religions that get started up once every few years and then fade into nothingness.

Historically, such religions haven’t been well documented, but that doesn’t mean they didn’t exist. In contemporary times, one can only look at the thousands of “babas” with cults all around India – each is leading his/her own “startup religion”, and most of them are likely to sink without a trace.

Comparing the best in one class (religions that have survived and thrived over thousands of years) to the average of another class (the average corporation) just doesn’t make sense!

 

Stirring the pile efficiently

Warning: This is a technical post, and involves some code, etc. 

As I’ve ranted a fair bit on this blog over the last year, a lot of “machine learning” in the industry can be described as “stirring the pile”. Regular readers of this blog will be familiar with this image from XKCD by now:

Source: https://xkcd.com/1838/

Basically people simply take datasets and apply all the machine learning techniques they have heard of (implementation is damn easy – scikit learn allows you to implement just about any model in three similar looking lines of code; See my code here to see how similar the implementation is).

So I thought I’ll help these pile-stirrers by giving some hints of what method to use for different kinds of data. I’ve over-simplified stuff, and so assume that:

  1. There are two predictor variables X and Y. The predicted variable “Z” is binary.
  2. X and Y are each drawn from a standard normal distribution.
  3. The predicted variable Z is “clean” – there is a region in the X-Y plane where Z is always “true” and another region where Z is always “false”
  4. So the idea is to see which machine learning techniques are good at identifying which kind of geometrical figures.
  5. Everything is done “in-sample”. Given the nature of the data, it doesn’t matter if we do it in-sample or out-of-sample.

For those that understand Python (and every pile-stirrer worth his salt is excellent at Python), I’ve put my code in a nice Jupyter Notebook, which can be found here.

So this is what the output looks like. The top row shows the “true values” of Z. Then we have a row for each of the techniques we’ve used, which shows how well these techniques can identify the pattern given in the top row (click on the image for full size).

As you can see, I’ve chosen some common geometrical shapes and seen which methods are good at identifying those. A few pertinent observations:

  1. Logistic regression and linear SVM are broadly similar, and both are shit for this kind of dataset. Being linear models, they fail to deal with non-linear patterns
  2. SVM with RBF kernel is better, but it fails when there are multiple “true regions” in the dataset. At least it’s good at figuring out some non-linear patterns. However, it can’t figure out the triangle or square – it draws curves around them, instead.
  3. Naive Bayesian (I’ve never understood this even though I’m pretty good at Bayesian statistics, but I understand this is a commonly used technique; and I’ve used default parameters so not sure how it is “Bayesian” even) can identify some stuff but does badly when there are disjoint regions where Z is true.
  4. Ensemble methods such as Random Forests and Gradient Boosting do rather well on all the given inputs. They do well for both polygons and curves. Elsewhere, Ada Boost mostly does well but trips up on the hyperbola.
  5. For some reason, Lasso fails to give an output (in the true spirit of pile-stirring, I didn’t explore why). Ridge is again a regression method and so does badly on this non-linear dataset
  6. Neural Networks (Multi Layer Perceptron to be precise) does reasonably well, but can’t figure out the sharp edges of the polygons.
  7. Decision trees again do rather well. I’m pleasantly surprised that they pick up and classify the disjoint sets (multi-circle and hyperbola) correctly. Maybe it’s the way scikit learn implements them?

Of course, the datasets that one comes across in real life are never such simple geometrical figures, but I hope that this set can give you some idea on what techniques to use where.

At least I hope that this makes you think about the suitability of different techniques for the data rather than simply applying all the techniques you know and then picking the one that performs best on your given training and test data.

That would count as nothing different from p-hacking, and there’s an XKCD for that as well!

Source: https://xkcd.com/882/

Astrology and Data Science

The discussion goes back some 6 years, when I’d first started setting up my data and management consultancy practice. Since I’d freshly quit my job to set up the said practice, I had plenty of time on my hands, and the wife suggested that I spend some of that time learning astrology.

Considering that I’ve never been remotely religious or superstitious, I found this suggestion preposterous (I had a funny upbringing in the matter of religion – my mother was insanely religious (including following a certain Baba), and my father was insanely rationalist, and I kept getting pulled in both directions).

Now, the wife has some (indirect) background in astrology. One of her aunts is an astrologer, and specialises in something called “prashNa shaastra“, where the prediction is made based on the time at which the client asks the astrologer a question. My wife believes this has resulted in largely correct predictions (though I suspect a strong dose of confirmation bias there), and (very strangely to me) seems to believe in the stuff.

“What’s the use of studying astrology if I don’t believe in it one bit”, I asked. “Astrology is very mathematical, and you are very good at mathematics. So you’ll enjoy it a lot”, she countered, sidestepping the question.

We went off into a long discussion on the origins of astrology, and how it resulted in early developments in astronomy (necessary in order to precisely determine the position of planets), and so on. The discussion got involved, and involved many digressions, as discussions of this sort might entail. And as you might expect with such discussions, my wife threw a curveball, “You know, you say you’re building a business based on data analysis. Isn’t data analysis just like astrology?”

I was stumped (ok I know I’m mixing metaphors here), and that had ended the discussion then.

Until I decided to bring it up recently. As it turns out, once again (after a brief hiatus when I decided I’ll do a job) I’m in process of setting up a data and management consulting business. The difference is this time I’m in London, and that “data science” is a thing (it wasn’t in 2011). And over the last year or so I’ve been kinda disappointed to see what goes on in the name of “data science” around me.

This XKCD cartoon (which I’ve shared here several times) encapsulates it very well. People literally “pour data into a machine learning system” and then “stir the pile” hoping for the results.

Source: https://xkcd.com/1838/

In the process of applying fairly complex “machine learning” algorithms, I’ve seen people not really bother about whether the analysis makes intuitive sense, or if there is “physical meaning” in what the analysis says, or if the correlations actually determine causation. It’s blind application of “run the data through a bunch of scikit learn models and accept the output”.

And this is exactly how astrology works. There are a bunch of predictor variables (position of different “planets” in various parts of the “sky”). There is the observed variable (whether some disaster happened or not, basically), which is nicely in binary format. And then some of our ancients did some data analysis on this, trying to identify combinations of predictors that predicted the output (unfortunately they didn’t have the power of statistics or computers, so in that sense the models were limited). And then they simply accepted the outputs, without challenging why it makes sense that the position of Jupiter at the time of wedding affects how your marriage will go.

So I brought up the topic of astrology and data science again recently, saying “OK after careful analysis I admit that astrology is the oldest form of data science”. “That’s not what I said”, the wife countered. “I said that data science is new age astrology, and not the other way round”.

It’s hard to argue with that!

Weighting indices

One of the biggest recent developments in finance has been the rise of index investing. The basic idea of indexing is that rather than trying to beat the market, a retail investor should simply invest in a “market index”, and net of fees they are likely to perform better than they would if they were to use an active manager.

Indexing has become so popular over the years that researchers at Sanford Bernstein, an asset management firm, have likened it to being “worse than Marxism“. People have written dystopian fiction about “the last active manager”. And so on.

And as Matt Levine keeps writing in his excellent newsletter, the rise of indexing means that the balance of power in the financial markets is shifting from asset managers to people who build indices. The context here is that because now a lot of people simply invest “in the index”, determining which stock gets to be part of an index can determine people’s appetite for the stock, and thus its performance.

So, for example, you have indexers who want to leave stocks without voting rights (such as those of SNAP) out of indices. Some other indexers want to leave out extra-large companies (such as a hypothetically public Saudi Aramco) out of the index. And then there are people who believe that the way conventional indices are built is incorrect, and instead argue in favour of an “equally weighted index”.

While one an theoretically just put together a bunch of stocks and call it an “index” and sell it to investors making them believe that they’re “investing in the index” (since that is now a thing), the thing is that not every index is an index.

Last week, while trying to understand what the deal about “smart beta” (a word people in the industry throw around a fair bit, but something that not too many people are clear of what it means) is, I stumbled upon this excellent paper by MSCI on smart beta and factor investing.

About a decade ago, the Nifty (India’s flagship index) changed the way it was computed. Earlier, stocks in the Nifty were weighted based on their overall market capitalisation. From 2009 onwards, the weights of the stocks in the Nifty are proportional to their “free float market capitalisation” (that is, the stock price multiplied by number of shares held by the “public”, i.e. non promoters).

Back then I hadn’t understood the significance of the change – apart from making the necessary changes in the algorithm I was running at a hedge fund to take into account the new weights that is. Reading the MSCI paper made me realise the sanctity of weighting by free float market capitalisation in building an index.

The basic idea of indexing is that you don’t make any investment decisions, and instead simply “follow the herd”. Essentially you allocate your capital across stocks in exactly the same proportion as the rest of the market. In other words, the index needs to track stocks in the same proportion that the broad market owns it.

And the free float market capitalisation, which is basically the total value of the stock held by “public” (or non-promoters), represents the allocation of capital by the total market in favour of the particular stock. And by weighting stocks in the ratio of their free float market capitalisation, we are essentially mimicking the way the broad market has allocated capital across different companies.

Thus, only a broad market index that is weighted by free flow market capitalisation counts as “indexing” as far as passive investing is concerned. Investing in stocks in any other combination or ratio means the investor is expressing her views or preferences on the relative performance of stocks that are different from the market’s preferences.

So if you invest in a sectoral index, you are not “indexing”. If you invest in an index that is weighted differently than by free float market cap (such as the Dow Jones Industrial Average), you are not indexing.

One final point – you might wonder why indices have a finite number of stocks (such as the S&P 500 or Nifty 50) if true indexing means reflecting the market’s capital allocation across all stocks, not just a few large ones.

The reason why we cut off after a point is that beyond that, the weightage of stocks becomes so low that in order to perfectly track the index, the investment required is significant. And so, for a retail investor seeking to index, following the “entire market” might mean a significant “tracking error”. In other words, the 50 or 500 stocks that make up the index are a good representation of the market at large, and tracking these indices, as long as they are free float market capitalisation weighted, is the same as investing without having a view.

The Derick Parry management paradigm

Before you ask, Derick Parry was a West Indian cricketer. He finished his international playing career before I was born, partly because he bowled spin at a time when the West Indies usually played four fearsome fast bowlers, and partly because he went on rebel tours to South Africa.

That, however, doesn’t mean that I never watched him play – there was a “masters” series sometime in the mid 1990s when he played as part of the ‘West Indies masters” team. I don’t even remember who they were playing, or where (such series aren’t archived well, so I can’t find the score card either).

All I remember is that Parry was batting along with Larry Gomes, and the West Indies Masters were chasing a modest target. Parry is relevant to our discussion because of the commentator’s (don’t remember who – it was an Indian guy) repeated descriptions of how he should play.

“Parry should not bother about runs”, the commentator kept saying. “He should simply use his long reach and smother the spin and hold one end up. It is Gomes who should do the scoring”. And incredibly, that’s how West Indies Masters got to the target.

So the Derick Parry management paradigm consists of eschewing all the “interesting” or “good” or “impactful” work (“scoring”, basically. no pun intended), and simply being focussed on holding one end up, or providing support. It wasn’t that Parry couldn’t score – he had at Test batting average of 22, but on that day the commentator wanted him to simply hold one end up and let the more accomplished batsman do the scoring.

I’ve seen this happen at various levels, but this usually happens at the intra-company level. There will be one team which will explicitly not work on the more interesting part of the problem, and instead simply “provide support” to another team that works on this stuff. In a lot of cases it is not that the “supporting team” doesn’t have the ability or skills to execute the task end-to-end. It just so happens that they are a part of the organisation which is “not supposed to do the scoring”. Most often, this kind of a relationship is seen in companies with offshore units – the offshore unit sticks to providing support to the onshore unit, which does the “scoring”.

In some cases, the Derick Parry school goes to inter-company deals as well, and in such cases it is usually done so as to win the business. Basically if you are trying to win an outsourcing contract, you don’t want to be seen doing something that the client considers to be “core business”. And so even if you’re fully capable of doing that, you suppress that part of your offering and only provide support. The plan in some cases is to do a Mustafa’s camel, but in most cases that doesn’t succeed.

I’m not offering any comment on whether the Derick Parry strategy of management is good or not. All I’m doing here is to attach this oft-used strategy to a name, one that is mostly forgotten.

Chiltu

If my mother were to be alive at the time I got married, I’m not sure she would have been too happy that I was marrying someone named Pinky. At the least, she would have insisted that we call Priyanka by another name.

The reason for this is that for my mother, the “default Pinky” was her friend Girija’s dachshund. Now I might have told you about “default names” – basically for every name, there is one person with the name who you instinctively think of. While the default person attached to a name can change over time, at any point of time there is only one default.

And because of this, when I know nothing about a person apart from his/her name, I form a Bayesian prior image which reflects that of the default person with the same name. And I assume this is true of a lot of people – you judge other people by their names in the absence of other information.

So considering that my mother was my mother, and so also followed the practice of judging people from her corpus of “default names”, she wouldn’t have wanted a daughter-in-law who had a nickname which defaulted to a dog, even if it were a rather friendly dachshund.

Anyway, this is not what the post is about. So while Pinky was Girija aunty’s longstanding pet, she wasn’t her only dog. Periodically she would take in some other dogs, though none of them lasted anywhere as long as Pinky did (I don’t ever remember meeting any of the other dogs more than once). However, one of them is hard to forget.

He was an Indian pie-dog named Chiltu. He was quite young, but thanks to his breed, he already towered over Pinky. So it turned out that whenever they were fed, Chiltu would finish off his portion much before Pinky ate hers, and then he would go for Pinky’s food as well.

Now don’t ask me why I remember this. But I remember telling this story to “my Pinky” a few years back when I had finished eating some rather tasty food much quicker than her. And I remember telling her that day that I would “do a Chiltu” – which is basically to go after Pinky’s food once I had finished my own food.

And that name has stuck. Every time one of us beats the other to eating something tasty, and then goes for the other’s portion, we simply say “Chiltu”.

My mother is long gone. Girija aunty has been gone for longer. Girija aunty’s dog Pinky has been gone for even longer. And Chiltu didn’t live with her for too long. But then Chiltu’s name, eternally associated with this practice, lives on!