Algorithmic curation

When I got my first smartphone (a Samsung Galaxy Note 2) in 2013, one of the first apps I installed on it was Flipboard. I’d seen the app while checking out some phones at either the Apple or Samsung retail outlets close to my home, and it seemed like a rather interesting idea.

For a long time, Flipboard was my go-to app to check the day’s news, as it conveniently categorised news into “tech”, “business” and “sport” and learnt about my preferences and fed me stuff I wanted. And then after some update, it suddenly stopped working – somehow it started serving too much stuff I didn’t want to read about, and when I tuned (by “following” and “unfollowing” topics) my feed, it progressively got worse.

I stopped using it some 2 years back, but out of curiosity started using it again recently. While it did throw up some nice articles, there is too much unwanted stuff in the app. More precisely, there’s a lot of “clickbaity” stuff (“10 things about Narendra Modi you would never want to know” and the like) in my feed, meaning I have to wade through a lot of such articles to find the occasional good ones.

(Aside: I dedicate about half a chapter to this phenomenon in my book. The technical term is “congestion”. I talk about it in the context of markets in relationships and real estate)

Flipboard is not the only one. I use this app called Pocket to bookmark long articles and read later. A couple of years back, Pocket started giving “recommendations” based on what I’d read and liked. Initially it was good, and mostly curated from what my “friends” on Pocket recommended. Now, increasingly I’m getting clickbaity stuff again.

I stopped using Facebook a long time before they recently redesigned their newsfeed (to give more weight to friends’ stuff than third party news), but I suspect that one of the reasons they made the change was the same – the feed was getting overwhelmed with clickbaity stuff, which people liked but didn’t really read.

Basically, there seems to be a widespread problem in a lot of automatically curated news feeds. To put it another way, the clickbaity websites seem to have done too well in terms of gaming whatever algorithms the likes of Facebook, Flipboard and Pocket use to build their automated recommendations.

And more worryingly, with all these curators starting to do badly around the same time (ok this is my empirical observation. Given few data points I might be wrong), it suggests that all automated curation algorithms use a very similar algorithm! And that can’t be a good thing.

The Anti-Two Pizza Rule

So Amazon supposedly has a “two pizza rule” to limit the size of meetings – the convention is that two pizzas should be sufficient to feed all participants in any meeting. While pizza is not necessarily served at most meetings, the rule effectively implies that a meeting can’t have more than seven or eight people.

The point of the rule is not hard to see – a meeting that has too many people will inevitably have people who are not contributing, and it’s a waste of their time. Limiting meeting size also means cutting total time employees spend in meetings, meaning they can get more shit done.

While this is indeed a noble “rule” in a corporate setting, it just doesn’t work for parties. In fact, after having analysed lots of parties I’ve either hosted or attended over the years, and after an especially disastrous party not so long ago (I’ve waited a random amount of time since that party before writing this so as to not offend the hosts), I hereby propose the “anti two pizza rule” for parties.

While five to eight people is a good number for a meeting, having enough people contributing but no deadweight, the range doesn’t do well at all for more social gatherings. The problem is that with this number, it is not clear if the gathering should remain in one group, or split into multiple groups.

When you have a “one pizza party” (5-6 people or less), you have one tight group (no pun intended) and assuming that people will get along with each other, you’re likely to have a good time.

When you have a “three pizza party” (more than 10 people), it’s intuitive for the gathering to breakup into multiple groups, and if things go well, these groups will be fluid and everyone will have a good time. Such a gathering also allows people to test waters with multiple co-attendees and then settle on the mini-group that they’ll end up spending most time with.

A two-pizza party (6-10 people), on the other hand, falls between the two stools. One group means there will be people left out of the conversation without respite. In such a small gathering, it is also not easy to break out of the main group and start your own group (again, seating arrangement matters). And so while some attendees (the “core group”) might end up having fun, the party doesn’t really work for most participating parties.

So, the next time you’re hosting a party, do yourself and your guests a favour and ensure that you don’t end up with between 6 and 10 people at the party. Either less or more is fine!

You might want to read this other post I’ve written on coordinating guest lists for birthday parties.

FaceTime Baby

My nephew Samvit, born in 2011, doesn’t talk much on the phone. It’s possibly because he didn’t talk much on the phone as a baby, but I’ve never been able to have a decent phone conversation with him (we get along really well when we meet, though). He talks a couple of lines and hands over the phone to his mother and runs off. If it’s a video call, he appears, says hi and disappears.

Berry (born in 2016), on the other hand, seems to have in a way “leapfrogged” the phone. We moved to London when she was five and a half months old, and since then we’ve kept in touch with my in-laws and other relatives primarily through video chat (FaceTime etc.). And so Berry has gotten used to seeing all these people on video, and has become extremely comfortable with the medium.

For example, when we were returning from our last Bangalore trip in December, we were worried that Berry would miss her grandparents tremendously. As it turned out, we landed in London and video called my in-laws, and Berry was babbling away as if there was no change in scene!

Berry has gotten so used to video calling that she doesn’t seem to get the “normal” voice call. Sure enough, she loves picking up the phone and holding it against her ear and saying “hello” and making pretend conversations (apparently she learnt this at her day care). But give her a phone and ask her to talk, and she goes quiet unless there’s another person appearing on screen.

Like there’s this one aunt of mine who is so tech-phobic that she doesn’t use video calls. And every time I call her she wants to hear Berry speak, except that Berry won’t speak because there is nobody on the screen! I’m now trying to figure out how to get this aunt to get comfortable with video calling just so that Berry can talk to her!

 

In that sense, Berry is a “video call” native. And I wouldn’t be surprised if it turns out that she’ll find it really hard to get comfortable with audio calls later on in life.

I’ll turn into one uncle now and say “kids nowadays… “

Religion and survivorship bias

Biju Dominic of FinalMile Consulting has a piece in Mint about “what CEOs can learn from religion“. In that, he says,

Despite all the hype, the vast majority of these so-called highly successful, worthy of being emulated companies, do not survive even for few decades. On the other hand, religion, with all its inadequacies, continues to survive after thousands of years.

This is a fallacious comparison.

Firstly, comparing “religion” to a particular company isn’t dimensionally consistent. A better comparison would be to compare at the conceptual level – such as comparing “religion” to “joint stock company”. And like the former, the latter has done rather well for 300 years now, even if specific companies may fold up after a few years.

The other way to make an apples-to-apples comparison is to compare a particular company to a particular religion. And this is where survivorship bias comes in.

Most of the dominant religions of today are more than hundreds or thousands of years old. In the course of their journey to present-day strength, they have first established their own base and then fought off competition from other upstart religions.

In other words, when Dominic talks about “religion” he is only taking into account religions that have displayed memetic fitness over a really long period. What he fails to take account of are the thousands of startup religions that get started up once every few years and then fade into nothingness.

Historically, such religions haven’t been well documented, but that doesn’t mean they didn’t exist. In contemporary times, one can only look at the thousands of “babas” with cults all around India – each is leading his/her own “startup religion”, and most of them are likely to sink without a trace.

Comparing the best in one class (religions that have survived and thrived over thousands of years) to the average of another class (the average corporation) just doesn’t make sense!

 

Stirring the pile efficiently

Warning: This is a technical post, and involves some code, etc. 

As I’ve ranted a fair bit on this blog over the last year, a lot of “machine learning” in the industry can be described as “stirring the pile”. Regular readers of this blog will be familiar with this image from XKCD by now:

Source: https://xkcd.com/1838/

Basically people simply take datasets and apply all the machine learning techniques they have heard of (implementation is damn easy – scikit learn allows you to implement just about any model in three similar looking lines of code; See my code here to see how similar the implementation is).

So I thought I’ll help these pile-stirrers by giving some hints of what method to use for different kinds of data. I’ve over-simplified stuff, and so assume that:

  1. There are two predictor variables X and Y. The predicted variable “Z” is binary.
  2. X and Y are each drawn from a standard normal distribution.
  3. The predicted variable Z is “clean” – there is a region in the X-Y plane where Z is always “true” and another region where Z is always “false”
  4. So the idea is to see which machine learning techniques are good at identifying which kind of geometrical figures.
  5. Everything is done “in-sample”. Given the nature of the data, it doesn’t matter if we do it in-sample or out-of-sample.

For those that understand Python (and every pile-stirrer worth his salt is excellent at Python), I’ve put my code in a nice Jupyter Notebook, which can be found here.

So this is what the output looks like. The top row shows the “true values” of Z. Then we have a row for each of the techniques we’ve used, which shows how well these techniques can identify the pattern given in the top row (click on the image for full size).

As you can see, I’ve chosen some common geometrical shapes and seen which methods are good at identifying those. A few pertinent observations:

  1. Logistic regression and linear SVM are broadly similar, and both are shit for this kind of dataset. Being linear models, they fail to deal with non-linear patterns
  2. SVM with RBF kernel is better, but it fails when there are multiple “true regions” in the dataset. At least it’s good at figuring out some non-linear patterns. However, it can’t figure out the triangle or square – it draws curves around them, instead.
  3. Naive Bayesian (I’ve never understood this even though I’m pretty good at Bayesian statistics, but I understand this is a commonly used technique; and I’ve used default parameters so not sure how it is “Bayesian” even) can identify some stuff but does badly when there are disjoint regions where Z is true.
  4. Ensemble methods such as Random Forests and Gradient Boosting do rather well on all the given inputs. They do well for both polygons and curves. Elsewhere, Ada Boost mostly does well but trips up on the hyperbola.
  5. For some reason, Lasso fails to give an output (in the true spirit of pile-stirring, I didn’t explore why). Ridge is again a regression method and so does badly on this non-linear dataset
  6. Neural Networks (Multi Layer Perceptron to be precise) does reasonably well, but can’t figure out the sharp edges of the polygons.
  7. Decision trees again do rather well. I’m pleasantly surprised that they pick up and classify the disjoint sets (multi-circle and hyperbola) correctly. Maybe it’s the way scikit learn implements them?

Of course, the datasets that one comes across in real life are never such simple geometrical figures, but I hope that this set can give you some idea on what techniques to use where.

At least I hope that this makes you think about the suitability of different techniques for the data rather than simply applying all the techniques you know and then picking the one that performs best on your given training and test data.

That would count as nothing different from p-hacking, and there’s an XKCD for that as well!

Source: https://xkcd.com/882/

Astrology and Data Science

The discussion goes back some 6 years, when I’d first started setting up my data and management consultancy practice. Since I’d freshly quit my job to set up the said practice, I had plenty of time on my hands, and the wife suggested that I spend some of that time learning astrology.

Considering that I’ve never been remotely religious or superstitious, I found this suggestion preposterous (I had a funny upbringing in the matter of religion – my mother was insanely religious (including following a certain Baba), and my father was insanely rationalist, and I kept getting pulled in both directions).

Now, the wife has some (indirect) background in astrology. One of her aunts is an astrologer, and specialises in something called “prashNa shaastra“, where the prediction is made based on the time at which the client asks the astrologer a question. My wife believes this has resulted in largely correct predictions (though I suspect a strong dose of confirmation bias there), and (very strangely to me) seems to believe in the stuff.

“What’s the use of studying astrology if I don’t believe in it one bit”, I asked. “Astrology is very mathematical, and you are very good at mathematics. So you’ll enjoy it a lot”, she countered, sidestepping the question.

We went off into a long discussion on the origins of astrology, and how it resulted in early developments in astronomy (necessary in order to precisely determine the position of planets), and so on. The discussion got involved, and involved many digressions, as discussions of this sort might entail. And as you might expect with such discussions, my wife threw a curveball, “You know, you say you’re building a business based on data analysis. Isn’t data analysis just like astrology?”

I was stumped (ok I know I’m mixing metaphors here), and that had ended the discussion then.

Until I decided to bring it up recently. As it turns out, once again (after a brief hiatus when I decided I’ll do a job) I’m in process of setting up a data and management consulting business. The difference is this time I’m in London, and that “data science” is a thing (it wasn’t in 2011). And over the last year or so I’ve been kinda disappointed to see what goes on in the name of “data science” around me.

This XKCD cartoon (which I’ve shared here several times) encapsulates it very well. People literally “pour data into a machine learning system” and then “stir the pile” hoping for the results.

Source: https://xkcd.com/1838/

In the process of applying fairly complex “machine learning” algorithms, I’ve seen people not really bother about whether the analysis makes intuitive sense, or if there is “physical meaning” in what the analysis says, or if the correlations actually determine causation. It’s blind application of “run the data through a bunch of scikit learn models and accept the output”.

And this is exactly how astrology works. There are a bunch of predictor variables (position of different “planets” in various parts of the “sky”). There is the observed variable (whether some disaster happened or not, basically), which is nicely in binary format. And then some of our ancients did some data analysis on this, trying to identify combinations of predictors that predicted the output (unfortunately they didn’t have the power of statistics or computers, so in that sense the models were limited). And then they simply accepted the outputs, without challenging why it makes sense that the position of Jupiter at the time of wedding affects how your marriage will go.

So I brought up the topic of astrology and data science again recently, saying “OK after careful analysis I admit that astrology is the oldest form of data science”. “That’s not what I said”, the wife countered. “I said that data science is new age astrology, and not the other way round”.

It’s hard to argue with that!

Weighting indices

One of the biggest recent developments in finance has been the rise of index investing. The basic idea of indexing is that rather than trying to beat the market, a retail investor should simply invest in a “market index”, and net of fees they are likely to perform better than they would if they were to use an active manager.

Indexing has become so popular over the years that researchers at Sanford Bernstein, an asset management firm, have likened it to being “worse than Marxism“. People have written dystopian fiction about “the last active manager”. And so on.

And as Matt Levine keeps writing in his excellent newsletter, the rise of indexing means that the balance of power in the financial markets is shifting from asset managers to people who build indices. The context here is that because now a lot of people simply invest “in the index”, determining which stock gets to be part of an index can determine people’s appetite for the stock, and thus its performance.

So, for example, you have indexers who want to leave stocks without voting rights (such as those of SNAP) out of indices. Some other indexers want to leave out extra-large companies (such as a hypothetically public Saudi Aramco) out of the index. And then there are people who believe that the way conventional indices are built is incorrect, and instead argue in favour of an “equally weighted index”.

While one an theoretically just put together a bunch of stocks and call it an “index” and sell it to investors making them believe that they’re “investing in the index” (since that is now a thing), the thing is that not every index is an index.

Last week, while trying to understand what the deal about “smart beta” (a word people in the industry throw around a fair bit, but something that not too many people are clear of what it means) is, I stumbled upon this excellent paper by MSCI on smart beta and factor investing.

About a decade ago, the Nifty (India’s flagship index) changed the way it was computed. Earlier, stocks in the Nifty were weighted based on their overall market capitalisation. From 2009 onwards, the weights of the stocks in the Nifty are proportional to their “free float market capitalisation” (that is, the stock price multiplied by number of shares held by the “public”, i.e. non promoters).

Back then I hadn’t understood the significance of the change – apart from making the necessary changes in the algorithm I was running at a hedge fund to take into account the new weights that is. Reading the MSCI paper made me realise the sanctity of weighting by free float market capitalisation in building an index.

The basic idea of indexing is that you don’t make any investment decisions, and instead simply “follow the herd”. Essentially you allocate your capital across stocks in exactly the same proportion as the rest of the market. In other words, the index needs to track stocks in the same proportion that the broad market owns it.

And the free float market capitalisation, which is basically the total value of the stock held by “public” (or non-promoters), represents the allocation of capital by the total market in favour of the particular stock. And by weighting stocks in the ratio of their free float market capitalisation, we are essentially mimicking the way the broad market has allocated capital across different companies.

Thus, only a broad market index that is weighted by free flow market capitalisation counts as “indexing” as far as passive investing is concerned. Investing in stocks in any other combination or ratio means the investor is expressing her views or preferences on the relative performance of stocks that are different from the market’s preferences.

So if you invest in a sectoral index, you are not “indexing”. If you invest in an index that is weighted differently than by free float market cap (such as the Dow Jones Industrial Average), you are not indexing.

One final point – you might wonder why indices have a finite number of stocks (such as the S&P 500 or Nifty 50) if true indexing means reflecting the market’s capital allocation across all stocks, not just a few large ones.

The reason why we cut off after a point is that beyond that, the weightage of stocks becomes so low that in order to perfectly track the index, the investment required is significant. And so, for a retail investor seeking to index, following the “entire market” might mean a significant “tracking error”. In other words, the 50 or 500 stocks that make up the index are a good representation of the market at large, and tracking these indices, as long as they are free float market capitalisation weighted, is the same as investing without having a view.