Machine learning and degrees of freedom

For starters, machine learning is not magic. It might appear like magic when you see Google Photos automatically tagging all your family members correctly, down to the day of their birth. It might appear so when Siri or Alexa give a perfect response to your request. And the way AlphaZero plays chess is almost human!

But no, machine learning is not magic. I’d made a detailed argument about that in the second edition of my newsletter (subscribe if you haven’t already!).

One way to think of it is that the output of a machine learning model (which could be anything from “does this picture contain a cat?” to “is the speaker speaking in English?”) is the result of a mathematical formula, whose parameters are unknown at the beginning of the exercise.

As the system gets “trained” (of late I’ve avoided using the word “training” in the context of machine learning, preferring to use “calibration” instead. But anyway…), the hitherto unknown parameters of the formula get adjusted in a manner that the formula output matches the given data. Once the system has “seen” enough data, we have a model, which can then be applied on unknown data (I’m completely simplifying it here).

The genius in machine learning comes in setting up mathematical formulae in a way that given input-output pairs of data can be used to adjust the parameters of the formulae. The genius in deep learning, which has been the rage this decade, for example, comes from a 30-year old mathematical breakthrough called “back propagation”. The reason it took until a few years back for it to become a “thing” has to do with data availability, and compute power (check this terrific piece in the MIT Tech Review about deep learning).

Within machine learning, the degree of complexity of a model can vary significantly. In an ordinary univariate least squares regression, for example, there are only two parameters the system can play with (slope and intercept of the regression line). Even a simple “shallow” neural network, on the other hand, has thousands of parameters.

Because a regression has so few parameters, the kind of patterns that the system can detect is rather limited (whatever you do, the system can only draw a line. Nothing more!). Thus, regression is applied only when you know that the relationship that exists is simple (and linear), or when you are trying to force-fit a linear model.

The upside of simple models such as regression is that because there are so few parameters to be adjusted, you need relatively few data points in order to adjust them to the required degree of accuracy.

As models get more and more complicated, the number of parameters increases, thus increasing the complexity of patterns that can be detected by the system. Close to one extreme, you have systems that see lots of current pictures of you and then identify you in your baby pictures.

Such complicated patterns can be identified because the system parameters have lots of degrees of freedom. The downside, of course, is that because the parameters start off having so much freedom, it takes that much more data to “tie them down”. The reason Google Photos can tag you in your baby pictures is partly down to the quantum of image data that Google has, which does an effective job of tying down the parameters. Google Translate similarly uses large repositories of multi-lingual text in order to “learn languages”.

Like most other things in life, machine learning also involves a tradeoff. It is possible for systems to identify complex patterns, but for that you need to start off with lots of “degrees of freedom”, and then use lots of data to tie down the variables. If your data is small, then you can only afford a small number of parameters, and that limits the complexity of patterns that can be detected.

One way around this, of course, is to use your own human intelligence as a pre-processing step in order to set up parameters in a way that they can be effectively tuned by data. Gopi had a nice post recently on “neat learning versus deep learning“, which is relevant in this context.

Finally, there is the issue of spurious correlations. Because machine learning systems are basically mathematical formulae designed to learn patterns from data, spurious correlations in the input dataset can lead to the system learning random things, which can hamper its predictive power.

Data sets, especially ones that have lots of dimensions, can display correlations that appear at random, but if the input dataset shows enough of these correlations, the system will “learn” them as a pattern, and try to use them in predictions. And the more complicated your model gets, the harder it is to know what it is doing, and thus the harder it is to identify these spurious correlations!

And the thing with having too many “free parameters” (lots of degrees of freedom but without enough data to tie down the parameters) is that these free parameters are especially susceptible to learning the spurious correlations – for they have no other job.

Thinking about it, after all, machine learning systems are not human!

Biases, statistics and luck

Tomorrow Liverpool plays Manchester City in the Premier League. As things stand now I don’t plan to watch this game. This entire season so far, I’ve only watched two games. First, I’d gone to a local pub to watch Liverpool’s visit to Manchester City, back in September. Liverpool got thrashed 5-0.

Then in October, I went to Wembley to watch Tottenham Hotspur play Liverpool. The Spurs won 4-1. These two remain Liverpool’s only defeats of the season.

I might consider myself to be a mostly rational person but I sometimes do fall for the correlation-implies-causation bias, and think that my watching those games had something to do with Liverpool’s losses in them. Never mind that these were away games played against other top sides which attack aggressively. And so I have this irrational “fear” that if I watch tomorrow’s game (even if it’s from a pub), it might lead to a heavy Liverpool defeat.

And so I told Baada, a Manchester City fan, that I’m not planning to watch tomorrow’s game. And he got back to me with some statistics, which he’d heard from a podcast. Apparently it’s been 80 years since Manchester City did the league “double” (winning both home and away games) over Liverpool. And that it’s been 15 years since they’ve won at Anfield. So, he suggested, there’s a good chance that tomorrow’s game won’t result in a mauling for Liverpool, even if I were to watch it.

With the easy availability of statistics, it has become a thing among football commentators to supply them during the commentary. And from first hearing, things like “never done this in 80 years” or “never done that for last 15 years” sounds compelling, and you’re inclined to believe that there is something to these numbers.

I don’t remember if it was Navjot Sidhu who said that statistics are like a bikini (“what they reveal is significant but what they hide is crucial” or something). That Manchester City hasn’t done a double over Liverpool in 80 years doesn’t mean a thing, nor does it say anything that they haven’t won at Anfield in 15 years.

Basically, until the mid 2000s, City were a middling team. I remember telling Baada after the 2007 season (when Stuart Pearce got fired as City manager) that they’d be surely relegated next season. And then came the investment from Thaksin Shinawatra. And the appointment of Sven-Goran Eriksson as manager. And then the youtube signings. And later the investment from the Abu Dhabi investment group. And in 2016 the appointment of Pep Guardiola as manager. And the significant investment in players after that.

In other words, Manchester City of today is a completely different team from what they were even 2-3 years back. And they’re surely a vastly improved team compared to a decade ago. I know Baada has been following them for over 15 years now, but they’re unrecognisable from the time he started following them!

Yes, even with City being a much improved team, Liverpool have never lost to them at home in the last few years – but then Liverpool have generally been a strong team playing at home in these years! On the other hand, City’s 18-game winning streak (which included wins at Chelsea and Manchester United) only came to an end (with a draw against Crystal Palace) rather recently.

So anyways, here are the takeaways:

  1. Whether I watch the game or not has no bearing on how well Liverpool will play. The instances from this season so far are based on 1. small samples and 2. biased samples (since I’ve chosen to watch Liverpool’s two toughest games of the season)
  2. 80-year history of a fixture has no bearing since teams have evolved significantly in these 80 years. So saying a record stands so long has no meaning or predictive power for tomorrow’s game.
  3. City have been in tremendous form this season, and Liverpool have just lost their key player (by selling Philippe Coutinho to Barcelona), so City can fancy their chances. That said, Anfield has been a fortress this season, so Liverpool might just hold (or even win it).

All of this points to a good game tomorrow! Maybe I should just watch it!



More issues with Slack

A long time back I’d written about how Slack in some ways was like the old DBabble messaging and discussion group platform, except for one small difference – Slack didn’t have threaded conversations which meant that it was only possible to hold one thread of thought in a channel, significantly limiting discussion.

Since then, Slack has introduced threaded conversations, but done it in an atrocious manner. The same linear feed in each channel remains, but there’s now a way to reply to specific messages. However, even in this little implementation Slack has done worse than even WhatsApp – by default, unless you check one little checkbox, your reply will only be sent to the person who originally posted the message, and doesn’t really post the message on the group.

And if you click the checkbox, the message is displayed in the feed, but in a rather ungainly manner. And threads are only one level deep (this was one reason I used to prefer LiveJournal over blogspot back in the day – comments could be nested in the former, allowing for significantly superior discussions).

Anyway, the point of this post is not about threads. It’s about another bug/feature of Slack which makes it an extremely difficult tool to use, especially for people like me.

The problem is slack is that it nudges you towards sending shorter messages rather than longer messages. In fact, there’s no facility at all to send a long well-constructed argument unless you keep holding on to Shift+Enter everytime you need a new line. There is a “insert text snippet” feature, but that lacks richness of any kind – like bullet points, for example.

What this does is to force you to use Slack for quick messages only, or only share summaries. It’s possible that this is a design feature, intended to capture the lack of attention span of the “twitter generation”, but it makes it an incredibly hard platform to use to have real discussions.

And when Slack is the primary mode of communication in your company (some organisations have effectively done away with email for internal communications, preferring to put everything on Slack), there is no way at all to communicate nuance.

PS: It’s possible that the metric for someone at Slack is “number of messages sent”. And nudging users towards writing shorter messages can mean more messages are sent!

PS2: DBabble allowed for plenty of nuance, with plenty of space to write your messages and arguments.


Nested Ternary Operators

It’s nearly twenty years since I first learnt to code, and the choice of language (imposed by school, among others) then was C. One of the most fascinating things about C was what was simply called the “ternary operator”, which is kinda similar to the IF statement in Excel, ifelse statement in R and np.where statement in Python.

Basically the ternary operator consisted of a ‘?’ and a ‘:’. It was a statement that took the form of “if this then that else something else”. So, for example, if you had two variables a and b, and had to return the maximum of them, you could use the ternary operator to say a>b?a:b.

Soon I was attending programming contests, where there would be questions on debugging programs. These would inevitably contain one question on ternary operators. A few years later I started attending job interviews for software engineering positions. The ternary operator questions were still around, except that now it would be common to “nest” ternary operators (include one inside the other). It became a running joke that the only place you’d see nested ternary operators was in software engineering interviews.

The thing with the ternary operator is that while it allows you to write your program in fewer lines of code and make it seem more concise, it makes the code a lot less readable. This in turn makes it hard for people to understand your code, and thus makes it hard to debug. In that sense, using the operator while coding in C is not considered particularly good practice.

It’s nearly 2018 now, and C is not used that much nowadays, so the ternary operator, and the nested ternary operator, have made their exit – even from programming interviews if I’m not wrong. However, people still continue to maintain this practice of writing highly optimised code.

Now, every programmer who thinks he’s a good programmer likes to write efficient code. There’s this sense of elegance about code written in a rather elegant manner, using only a few lines. Sometimes such elegant code is also more efficient, speeding up computation and consuming less memory (think, for example, vectorised operations in R).

The problem, however, is that such elegance comes with a tradeoff with readability. The more optimised a piece of code is, the harder it is for someone else to understand it, and thus the harder it is to debug. And the more complicated the algorithm being coded, the worse it gets.

It makes me think that the reason all those ternary operators used to appear in those software engineering interviews (FYI I’ve never done a software engineering job) is to check if you’re able to read complicated code that others write!

NRI Diaries: Day 3

The longer I’m here, the less I feel like an NRI and the more I go back to my earlier resident self. You can expect this series to dry out in a few days.

So Saturday started with a reversion of jetlag – I woke up at noon, at my in-laws’ place. One awesome breakfast/lunch/brunch (call it what you want – I ate breakfast stuff at 12:30 pm), it was time to get back home since I had some work at some banks around here.

I decided to take the metro. The wife dropped me by scooter to the Rajajinagar Metro Station. The ticket to South End Circle cost Rs. 30. The lady behind the counter didn’t crib when I gave her Rs. 100, and gave change.

Having used the metro as my primary mode of transport in London for the last nine months, I’m entitled to some pertinent observations:

  • Trains seemed very infrequent. When I went up to the platform, the next train was 8 minutes away. And there was already a crowd building up on the platform
  • Like in London, the platform has a yellow line and passengers are asked to wait behind that. But unlike in London, the moment you go near the yellow line, a guard whistles and asks you to get back. I’m reminded of Ravikiran Rao’s tweetstorm on Jewish walls.
  • For a Saturday afternoon, the train was extremely crowded.
  • My skills from an earlier life of expertly standing and grabbing a seat in a BMTC bus were of no use here, since other passengers also seemed to have that skill
  • My skills from the last few months in knowing where to stand comfortably in a crowded train were put to good use, though. I managed to read comfortably through my journey
  • It took 20 mins to get to South End. Another 10 mins walk home. Not sure this is quicker than taking a cab for the same journey

Afternoon was spent running around banks updating mobile number and Aadhaar. It was all peaceful, except for Punjab National Bank asking for a physical copy of my Aadhaar (which quite defeats the purpose! HDFC told me to update Aadhaar online. ICICI did it through ATM!).

In the evening I let go of some more vestiges of my NRI-ness. I got the water filter at home cleaned and started drinking filtered tap water. And then I went and had chaat at a street gaaDi. I promptly got “spicy burps”. I guess it was the masala powder he added.

I quickly made amends by going to my favourite jilebi stall and belting jilebi.

Then I went to meet fellow-NRI Paddy-the-Pradeep for coffee at Maiya’s in Jayanagar. We ordered bottled water, discussed first world economics and made jokes about NRIs carrying around bottled water. And then we walked out carrying the leftover bottled water as a NRI badge.

On my way home, I went to a nearby bakery and got plain cake, nippaTT and Congress.

All is well.

NRI Diaries: Day 2

NRI Diaries: Day 1

NRI Diaries: Day 2

I know this is a day late, but the reasons for that will be apparent by the end of the post.

Day Two (15th December) started with waking up at 9 am – jetlag had clearly not worn off. I was going to be late for my 10:30 meeting and started getting ready in a hurry only to see a text from the person I was meeting that he was late as well.

Once again I took an auto rickshaw for breakfast. Meter showed Rs. 35. I handed a Rs. 100 note. Driver said “no change”, and didn’t seem to mind when I told him that I’ll get change from the restaurant I planned to eat at and that he should wait. I bought coupons for my food, and brought back Rs. 50 for the auto guy, and he promptly gave me the change.

The meeting in question was on the other side of Silk Board, and I was dreading the commute. Surprisingly, the commute was rather smooth, taking less than 20 minutes from Jayanagar 4th T Block to HSR Layout. Along the way I got to hear the driver’s life story as he was constantly on the phone with a friend of his.

Traffic was worse on the way back from the meeting (started from HSR around 1230 pm). Took nearly an hour to get home (Jayanagar 3rd Block). And along the way I saw this:

I honestly miss this kind of stuff back in the UK, where I find people taking “data science” too seriously (another post on that sometime in the future).

Lunch was swiggied. Main course came from Gramina Thindi, It’s a tiny restaurant and doesn’t have a computer, so it’s not integrated into Swiggy’s ordering system. So swiggy actually sent a guy to the restaurant to place my order, and he waited there while it was being prepared and then brought it home to me.

I totally didn’t mind the Rs. 35 “delivery fee” they charged on top of my Rs. 55 lunch.

Dessert was from Corner House. Cake Fudge was as excellent as usual. Made a mental note to introduce this delicacy to the daughter before this trip is up.

And then it was time to go launch my book. Sales of the book are not exclusive to Amazon any more – it’s also available at Higginbothams on M G Road, which is where the book launch happened.

The launch was at this nice outdoor backyard of the store. I spoke to Pavan Srinath about some of the concepts I’ve described in the book. After that I signed copies, trying hard to get a wisecrack for everyone I signed for. I mostly failed.

The highlight of the launch was this guy zipping across the venue right behind me on a scooter, and then loudly honking. He was followed by another guy on a bike.

After the launch function was over, the wife and I decided to head to Mahesh Lunch Home for dinner. We took an auto. The guy at MG Road demanded Rs. 80 (ordinarily an exorbitant amount) to take us to Richmond Circle. We instantly agreed and got in.

He may have had some sense of seller’s remorse after that – in that he probably priced himself too low. So he drove slowly and, as we got to Richmond Circle, he said it would cost us a further Rs. 20 to take us across the road to Mahesh. We paid up again.

Something’s seriously wrong with Uber in Bangalore it seems. Out of six times I’ve tried using the service, I’ve got a cab within 5 minutes on only one occasion. On a few occasions, it’s been upwards of 10 minutes. And when the app showed that the nearest Uber was 20 mins away, we simply decided to take an auto rickshaw.

Except that we’d not bargained for drivers refusing outright to take us to Rajajinagar. One guy agreed and after we got in, asked for Rs. 300. This time, with our stomachs full, we were less charitable and walked out. Some walking and more waiting later, we were on our way to Rajajinagar, where I spent the night.

Oh, and it appears that the daughter has been afflicted by NRI-itis as well. She bears a red mark on her cheek following a mosquito bite.

NRI Diaries: Day 1

So I arrived in Bangalore this morning, after nine months in London. This makes this my first visit to India as a “Non Resident Indian” (NRI), and since foreign papers quite like getting opinions of India from NRI observers, I thought it makes sense to document my pertinent observations. I should mention upfront, though, that nobody is paying me for these observations.

The day began after a very short night’s sleep (we went to bed at 11 pm British Time and woke up at 7:30 AM India Time, a total of three hours) with a visit to one of our favourite breakfast establishments in Bangalore – Mahalakshmi Tiffin Room.

It was the daughter’s first ever auto rickshaw ride (back when we lived here we had a car and she was really tiny, so didn’t need to take her in an auto). She seemed rather nonchalant about it, occasionally turning her head to look outside. The auto ride cost us Rs 30. We gave Rs 100 and the driver asked us if we didn’t have change. Living outside makes you unlearn the art of change management.

We got our usual table at MLTR and were greeted by a rather usual waiter plonking three glasses of water on our table. We politely declined and requested for Bisleri.

After breakfast, it was time to get connected. I went to a medical shop near my home which I knew offers mobile phone top up services. Topping up the wife’s phone was rather straightforward, though it took some time given the crowd. During my fifteen minutes at the medical shop, at least six people came requesting for mobile phone top ups. Only two came asking for medicines. India seems to be getting healthier and wealthier.

Airtel decided to reassign my number to someone else so I needed a new SIM. I asked the medical shop guy for a Reliance Jio SIM. He spent ten minutes trying to log in to his Jio vendor app, and I gave up and took my business elsewhere. This elsewhere was a really tiny hole in the wall shop, which had a fingerprint reader that enabled the issue of a Jio SIM against Aadhaar authentication. The process was a breeze, except that I consider it weird that my mobile number starts with a 6 (the number I lost was a 9845- series Airtel).

Waiting at the hole-in-the-wall also made me realise that standing at shopfronts is not common practice in London. Thanks to high labour costs, most shops there are “self-service”. It’s also seldom that several people land up at one shopfront in London at the same time!

Losing my old number also meant I had to update the number with banks. I started with State Bank of India. The process was rather simple – took no more than 2 minutes. While at it, I asked about Aadhaar linking of my bank account there. There seems to be some confusion about it.

For example, I heard that if you have multiple accounts with the bank, you should only link one of them with Aadhaar – which defeats the purpose of the exercise, if one exists! Then, joint accounts need only one Aadhaar number to be linked. The linking process also differs based on who you ask. In any case, I encountered one rather helpful officer who completed my Aadhaar linking in a jiffy.

Then, my book is launching tomorrow which means I needed to buy new clothes. I landed up at FabIndia, and as is the practice in forin, I kept saying “hi” and “thank you” to the salespeople, who kept muttering “you’re welcome, sir”. While at it, the missus discovered that FabIndia now has rather explicit sales targets per store, which possibly explains why the salespeople there were more hands on compared to earlier.

Later in the evening, I got a haircut and a head massage. The last time I visited this salon, it was called “noble” (a rather common name for haircutting shops in Bangalore. Like Ganesh Fruit Juice Centres). Now it’s called “nice cuts”. The head massage was fantastic – I miss this kind of service back in the UK. I also borrowed the inlaws’ car and drove it around and even managed to parallel park it – nine months of no driving has done no harm to my driving skills.

Hopefully I’ll have more observations tomorrow.