Beer and diapers: Netflix edition

When we started using Netflix last May, we created three personas for the three of us in the family – “Karthik”, “Priyanka” and “Berry”. At that time we didn’t realise that there was already a pre-created “kids” (subsequently renamed “children” – don’t know why that happened) persona there.

So while Priyanka and I mostly use our respective personas to consume Netflix (our interests in terms of video content hardly intersect), Berry uses both her profile and the kids profile for her stuff (of course, she’s too young to put it on herself. We do it for her). So over the year, the “Berry” profile has been mostly used to play Peppa Pig, and the occasional wildlife documentary.

Which is why we were shocked the other day to find that “Real life wife swap” had been recommended on her account. Yes, you read that right. We muttered a word of abuse about Netflix’s machine learning algorithms and since then have only used the “kids” profile to play Berry’s stuff.

Since then I’ve been wondering what made Netflix recommend “real life wife swap” to Berry. Surely, it would have been clear to Netflix that while it wasn’t officially classified as one, the Berry persona was a kid’s account? And even if it didn’t, didn’t the fact that the account was used for watching kids’ stuff lead the collaborative filtering algorithms at Netflix to recommend more kids’ stuff? I’ve come up with various hypotheses.

Since I’m not Netflix, and I don’t have their data, I can’t test it, but my favourite hypothesis so far involves what is possibly the most commonly cited example in retail analytics – “beer and diapers“. In this most-likely-apocryphal story, a supermarket chain discovered that beer and diapers were highly likely to appear together in shopping baskets. Correlation led to causation and a hypothesis was made that this was the result of tired fathers buying beer on their diaper shopping trips.

So the Netflix version of beer-and-diapers, which is my hypothesis, goes like this. Harrowed parents are pestered by their kids to play Peppa Pig and other kiddie stuff. The parents are so stressed that they don’t switch to the kid’s persona, and instead play Peppa Pig or whatever from their own accounts. The kid is happy and soon goes to bed. And then the parent decides to unwind by watching some raunchy stuff like “real life wife swap”.

Repeat this story in enough families, and you have a strong enough pattern that accounts not explicitly classified as “kids/children” have strong activity of both kiddie stuff and adult content. And when you use an account not explicitly mentioned as “kids” to watch kiddie stuff, it gets matched to these accounts that have created the pattern – Netflix effectively assumes that watching kid stuff on an adult account indicates that the same account is used to watch adult content as well. And so serves it to Berry!

Machine learning algorithms basically work on identifying patterns in data, and then fitting these patterns on hitherto unseen data. Sometimes the patterns make sense – like Google Photos identifying you even in your kiddie pics. Other times, the patterns are offensive – like the time Google Photos classified a black woman as a “gorilla“.

Thus what is necessary is some level of human oversight, to make sure that the patterns the machine has identified makes some sort of sense (machine learning purists say this is against the spirit of machine learning, since one of the purposes of machine learning is to discover patterns not perceptible to humans).

That kind of oversight at Netflix would have suggested that you can’t tag a profile to a “kiddie content AND adult content” category if the profile has been used to watch ONLY kiddie content (or ONLY adult content). And that kind of oversight would have also led Netflix to investigate issues of users using “general” account for their kids, and coming up with an algorithm to classify such accounts as kids’ accounts, and serve only kids’ content there.

It seems, though, that algorithms run supreme at Netflix, and so my baby daughter gets served “real life wife swap”. Again, this is all a hypothesis (real life wife swap being recommended is a fact, of course)!

Genetic Algorithms

I first learnt about Genetic Algorithms almost exactly thirteen years ago, when it was taught to me by Prof. Deepak Khemani as part of a course on “artificial intelligence”. I remember liking the course a fair bit, and took a liking to the heuristics and “approximate solutions” after the mathematically intensive algorithms course of the previous semester.

The problem with the course, however, was that it didn’t require us to code the algorithms we had learnt (for which we were extremely thankful back then, since in term 5 of Computer Science at IIT Madras, this was pretty much the only course that didn’t involve too many programming assignments).

As a result, while I had learnt all these heuristics (hill climbing, simulated annealing, taboo search, genetic algorithms, etc.) fairly well in theory, I had been at a complete loss as to how to implement any of them. And so far, during the course of my work, I had never had an opportunity to use any of these techniques. Until today that is.

I can’t describe the problem here since this is a client assignment, but when I had to pick a subset from a large set that satisfied certain properties, I knew I needed a method that would reach the best subset quickly. A “fitness function” quickly came to mind and it was obvious that I should use genetic algorithms to solve the problem.

The key with using genetic algorithms is that you need to be able to code the solution in the form of a string, and then define functions such as “crossover” and “mutation”. Given that I was looking for a subset, coding it as a string was rather easy, and since I had unordered subsets, the crossover was also easy – basic random number generation. Within fifteen minutes of deciding I should use GA, the entire algorithm was in my head. It was only a question of implementing it.

As I started writing the code, I started getting fantasies of being able to finish it in an hour and then write a blog post about it. As it happened, it took much longer. The first cut took some three hours (including some breaks), and it wasn’t particularly good, and was slow to converge.  I tweaked things around a bit but things didn’t improve by much.

And that was when I realise that I had done the crossover wrong – when two strings have elements in common and need to be crossed over, I had to take care that elements in common did not repeat into the same “child” (needed the subsets to be of a certain length). So that needed some twist in the code. That done, the code still seemed inefficient.

I had been doing the crossover wrong. If I started off with 10 strings, I would form 5 pairs from them (each participating in exactly one pair) which would result in 10 new strings. And then I would put these 20 (original 10 and new 10) strings through a fitness test and discard the weakest 10. And iterate. The problem was that the strongest strings had as much of a chance of reproducing as the weakest. This was clearly not right.

So I tweaked the code so that the fitter strings had a higher chance of reproducing than the less fit. This required me to put the fitness test at the beginning of each iteration rather than the end. I had to refactor the code a little bit to make sure I didn’t repeat computations. Now I was drawing pairs of strings from the original “basket” and randomly crossing them over. And putting them through the fitness test. And so forth.

I’m extremely happy with the results of the algorithm. I’ve got just the kind of output that I had expected. More importantly, I was extremely happy with the process of coding the whole thing in. I did the entire coding in R, which is what I use for my data analysis (data size meant I didn’t need anything quicker).

The more interesting part is that this only solved a very small part of the problem I’m trying to solve for my client. Tomorrow I’m back to solving a different part of the problem. Genetic algorithms have served their purpose. Back when I started this assignment I had no clue I would be using genetic algorithms. In fact, I had no clue what techniques I might use.

Which is why I get annoyed when people ask me what kind of techniques I use in my problem solving. Given the kind of problems I take on, most will involve a large variety of math, CS and statistics techniques, each of which will only play a small role in the entire solution. This is also the reason I get annoyed when people put methods they are going to use to solve the problem on their pitch-decks. To me, that gives an impression that they are solving a toy version of the problem and not the real problem – or that the consultants are grossly oversimplifying the problem to be solved.

PS: Much as some people might describe it that way, I wouldn’t describe Genetic Algorithms as “machine learning”. I think there’s way too much intervention on the part of the programmer for it to be described thus.

Making coding cool again

I learnt to code back in 1998. My aunt taught me the basics of C++, and I was fascinated by all that I could make my bad old x386 computer to do. Soon enough I was solving complex math problems, and using special ASCII characters to create interesting pattens on screen. It wasn’t long before I wrote the code for two players sitting on the same machine to play Pong. And that made me a star.

I was in a rather stud class back then (the school I went to in class XI had a reputation for attracting toppers), and after a while I think I had used my coding skills to build a reasonable reputation. In other words, coding was cool. And all the other kids also looked up to coding as a special skill.

Somewhere down the line, though I don’t remember when it was, coding became uncool. Despite graduating with a degree in Computer Science from IIT Madras, I didn’t want a “coding job”. I ended up with one, but didn’t want to take it, and so I wrote some MBA entrance exams, and made my escape that way.

By the time I graduated from my MBA, coding had become even more uncool. If you were in a job that required you to code, it was an indication that you were in the lowest rung, and thus not in a “management job”. Perhaps even worse, if your job required you to code, you were probably in an “IT job”, something that was back then considered as being a “dead end” and thus not a very preferred job. Thus, even if you coded in your job, you tended to downplay it. You didn’t want your peers to think you were either in a “bottom rung” job or in an “IT job”. So I wrote fairly studmax code (mostly using VB on Excel) but didn’t particularly talk about it when I met my MBA friends. As I moved jobs (they became progressively studder) my coding actually increased, but I continued to downplay the coding bit.

And I don’t think it’s just me. Thanks to the reasons explained above, coding is considered uncool among most MBA graduates. Even most engineering graduates from good colleges don’t find coding cool, for that is the job that their peers in big-name big-size dead-end-job software services companies do. And if people consider coding uncool, it has a dampening impact on the quality of talent that goes into jobs that involves coding. And that means code becomes less smart. And so forth.

So the question is how we can make coding cool again. I’m not saying it’s totally uncool. There are plenty of talented people who want to code, and who think it’s cool. The problem though is that the marginal potential coder is not taking to coding because he thinks that coding is not cool enough. And making coding cool will make more people want to take it up, which will lead to greater number of people take up this vocation!

Any ideas?

Hedgehogs and foxes: Or, a day in the life of a quant

I must state at the outset that this post is inspired by the second chapter of Nate Silver’s book The Signal and the Noise. In that chapter, which is about election forecasting, Silver draws upon the old Russian parable of the hedgehog and the fox. According to that story, the fox knows several tricks while the hedgehog knows only one – curling up into a ball. The story ends in favour of the hedgehog, as none of the tricks of the unfocused fox can help him evade the predator.

Most political pundits, says Silver, are like hedgehogs. They have just one central idea to their punditry and they tend to analyze all issues through that. A good political forecaster, however, needs to be able to accept and process any new data inputs, and include that in his analysis. With just one technique, this can be hard to achieve and so Silver says that to be a good political forecaster one needs to be a fox. While this might lead to some contradictory statements and thus bad punditry, it leads to good forecasts. Anyway, you can know about election forecasting from Silver’s book.

The world of “quant” and “analytics” which I inhabit is again similarly full of hedgehogs. You have the statisticians, whose solution for every problem is a statistical model. They can wax eloquent about Log Likelihood Estimators but can have trouble explaining why you should use that in the first place. Then you have the banking quants (I used to be one of those), who are proficient in derivatives pricing, stochastic calculus and partial differential equations, but if you ask them why a stock price movement is generally assumed to be lognormal, they don’t have answers. Then you have the coders, who can hack, scrape and write really efficient code, but don’t know much math. And mathematicians who can come up with elegant solutions but who are divorced from reality.

While you might make a career out of falling under any of the above categories, to truly unleash your potential as a quant, you should be able to do all. You should be a fox and should know each of these  tricks. And unlike the fox in the Old Russian fairy tale, the key to being a good fox is to know what trick to use when. Let me illustrate this with an example from my work today (actual problem statement masked since it involves client information).

So there were two possible distributions that a particular data point could have come from and I had to try and analyze which of them it came from (simple Bayesian probability, you might think). However, calculating the probability wasn’t so straightforward, as it wasn’t a standard function. Then I figured I could solve the probability problem using the inclusion-exclusion principle (maths again), and wrote down a mathematical formulation for it.

Now, I was dealing with a rather large data set, so I would have to use the computer, so I turned my mathematical solution into pseudo-code. Then, I realized that the pseudo-code was recursive, and given the size of the problem I would soon run out of memory. I had to figure out a solution using dynamic programming. Then, following some more code optimization, I had the probability. And then I had to go back to do the Bayesian analysis in order to complete the solution. And then present the solution in the form of a “business solution”, with all the above mathematical jugglery being abstracted from the client.

This versatility can come in handy in other places, too. There was a problem for which I figured out that the appropriate solution involved building a classification tree. However, given the nature of the data at hand, none of the off-the-shelf classification tree algorithms for were ideal. So I simply went ahead and wrote my own code for creating such trees. Then, I figured that classification trees are in effect a greedy algorithm, and can lead to getting stuck at local optima. And so I put in a simulated annealing aspect to it.

While I may not have in depth knowledge of any of the above techniques (to gain breadth you have to sacrifice depth), that I’m aware of a wide variety of techniques means I can provide the solution that is best for the problem at hand. And as I go along, I hope to keep learning more and more techniques – even if I don’t use them, being aware of them will lead to better overall problem solving.

Dropping out

Less than a semester into my undergrad (Bachelor of Technology in Computer Science and Engineering at IIT Madras) I wanted to drop out, and start work. I didn’t want to be an “engineer”.

I didn’t know why I’d to spend all my Thursday and Friday afternoons filing away at some piece of iron in the “fitting workshop”. I didn’t have the patience to draw three views of a random object in “engineering drawing”.

And I had the reputation of being one of the studdest programmers in my school. Apart from winning competitions here and there and doing well in acads, I had enormous respect from peers for my programming skills. Given that it was a “high-performance school” (which subjected its own 10th standard students to a test before admitting them to 11th) I guess this peer respect does carry some weight.

So, being good at math, and having the reputation of being a stud programmer, I didn’t know what I was doing studying “engineering”. I wanted to be a programmer, and I wanted to drop out and take up a job. My JEE rank counted almost as much as an IIT degree, I thought. I didn’t have the balls, and I continued.

In hindsight, I’m happy I didn’t drop out. By the end of my second year, I knew for sure that I DIDN’T want to be a programmer. While the theoretical aspects of Computer Science excited me (algo analysis and stuff), I had absolutely no patience for “systems”, or “computer engineering”. I was perhaps alone in my class in my love for Microsoft products (easy to use).

I realized then that I liked only the algorithmic aspect of programming, where one solves a (mostly math) problem and codes it up in a simple program. Huge complicated systems-intensive programming, making GUIs etc. didn’t inspire me at all.

Looking back, all that “major” (i.e. Computer Science and Engineering) stuff that I’ve learnt and internalized was learnt in my first two years of engineering. Of course several concepts that are part of CS&E are taught in the last two years, but I ended up not liking any of that.

Looking back, I do find it positive that I did all those “general engineering” courses. I do find it really positive that we had to do 12 compulsory credits in Humanities and Social Sciences, for that allowed me to discover what I was really interested in, and indirectly led me to doing my MBA.

I have only one regret. That I wasn’t able to switch streams sooner than I could. That IIT, being a one-dimensional technology oriented university, didn’t allow me to transfer credits to a course that I would’ve liked better, simply because it offered undergrad courses only in engineering.

There was a humanities department, where I discovered what I was interested in, but unfortunately it was a “minor” department. It’s been partly rectified now, with the setting up of integrated MA courses, in Economics, etc. (if that course existed back when I was studying, there’s a good chance I’d’ve transferred to it from CS&E). But it’s not enough.

Kids at 17 have no clue what they want to do. What we need are flexible full-scale universities, which allow you to switch from any branch to any other branch after two years of reasonably generalized study (the earlier branch can then contribute to “minor” credits). We need to stop putting our colleges in silos such as “engineering”, “arts and science”, etc. Only then would our universities be truly world class, even from an undergraduate point of view.

And looking back, I’m really happy I didn’t drop out.

Coding

Back when I was in school (11th/12th) I think I was an awesome coder. I think I was especially good at what they called as “logic coding”, i.e. coming up with algos. I used to experiment quite a bit (as much was possible with TurboC) and had a lot of fun too. I remember doing graphics in TurboC, making a “pong” game, brick breaker, and a lot of other cool stuff. For our 12th standard project, Hareesh and I built this totally awesome cricket scoring program, which we unfortunately didn’t take forward (and went to college instead).

It was my love for coding that meant I fought with my parents (who wanted me to study Electrical) and decided to study Computer Science at IIT Madras. And then I lost it. Somewhere along the way. I didn’t enjoy coding any more. Soon, I began to hate coding. I would love coding when I would write the odd program in “pure” C, or when I would participate in contests such as BITWise. But I’d completely lost it.

So over the last six to seven years (after I graduated from IIT) there have been occasions when I have thought I’ve regained my coding mojo, only to lose it again very soon. I’m still very proud of that Excel+VBA model that I had written in the very first week of my third job. But a couple of months later, I was hating coding again. And so it was while debugging a complicated piece of code at work this morning that I realize why I have this love-hate relationship with coding.

It’s simple – basically I hate coding for others. I hate writing code that others will read or use. I don’t mind writing code that others would use as a black box, of course. But I think writing code that others will read or use puts too many constraints on the way you code. My instinct is always to stop doing something when I’m personally satisfied with it, and with code it seems like I’m satisfied sooner than others would be satisfied with my code.

At a fundamental level, I like coding and I think I’m pretty good at it, so it isn’t something I want to give up. But then the formal processes and endless testing involved with writing code for others really kills joy (as does GUI, and Java). Code saves a lot of time, and helps “studdize” what might be otherwise fighter work, so I like doing it.

In an ideal world, I would be writing code that I would alone be using, AND profiting from it (I never intend to sell code; I intend to sell the results of the said code, however; that would mean no one else would read/use my code per se, so I can write it the way I want). Hopefully I’ll get there, sometime.

Arranged Scissors 13 – Pruning

Q: How do you carve an elephant?
A: Take a large stone and remove from it all that doesn’t look like an elephant

– Ancient Indian proverb, as told to us by Prof C Pandu Rangan during the Design of Algorithms course

As I had explained in a post a long time ago, this whole business of louvvu and marriage and all such follows a “Monte Carlo approach“. When you ask yourself the question “Do I want a long-term gene-propagating relationship with her?” , the answer is one of “No” or “Maybe”. Irrespective of how decisive you are, or how perceptive you are, it is impossible for you to answer that question with a “Yes” with 100% confidence.

Now, in Computer Science, the way this is tackled is by running the algorithm a large number of times. If you run the algo several times, and the answer is “Maybe” in each iteration, then you can put an upper bound on the probability that the answer is “No”. And with high confidence (though not 100%) you can say “Probably yes”. This is reflected in louvvu also – you meet several times, implicitly evaluate each other on several counts, and keep asking yourselves this question. And when both of you have asked yourselves this question enough times, and both have gotten consistent maybes, you go ahead and marry (of course, there is the measurement aspect also that is involved).

Now, the deal with the arranged marriage market is that you aren’t allowed to have too many meetings. In fact, in the traditional model, the “darshan” lasts only for some 10-15 mins. In extreme cases it’s just a photo but let’s leave that out of the analysis. In modern times, people have been pushing to get more time, and to get more opportunities to run iterations of the algo. Even then, the number of iterations you are allowed is bounded, which puts an upper bound on the confidence with which you can say yes, and also gives fewer opportunity for “noes”.

Management is about finding a creative solution to a system of contradictory constraints
– Prof Ramnath Narayanswamy, IIMB

So one way to deal with this situation I’ve described is by what can be approximately called “pruning”. In each meeting, you will need to maximize the opportunity of detecting a “no”. Suppose that in a normal “louvvu date”, the probability of a “no” is 50% (random number pulled out of thin air). What you will need to do in order to maximize information out of an “arranged date” (yes, that concept exists now) is to raise this probability of a “no” to a higher number, say 60% (again pulled out of thing air).

If you can design your interaction so as to increase the probability of detecting a no, then you will be able to extract more information out of a limited number of meetings. When the a priori rejection rate per date is 50%, you will need at least 5 meetings with consistent “maybes” in order to say “yes” with a confidence of over 50% (I’m too lazy to explain the math here), and this is assuming that the information you gather in one particular iteration is independent of all information gathered in previous iterations.

(In fact, considering that the amount of incremental information gathered in each subsequent iteration is a decreasing function, the actual number of meetings required is much more)

Now, if you raise the a priori probability of rejection in one particular iteration to 60%, then you will need only 4 independent iterations in order to say “yes” with a confidence of over 95% (and this again is by assuming independence).

Ignore all the numbers I’ve put, none of them make sense. I’ve only given them to illustrate my point. The basic idea is that in an “arranged date”, you will need to design the interaction in order to “prune” as much as possible in one particular iteration. Yes, this same thing can be argued for normal louvvu also, but there I suppose the pleasure in the process compensates for larger number of iterations, and there is no external party putting constraints.