Modern Ganeshas

Om Ganeshaaya Namaha

There is this theory I have heard – just that I have forgotten the source – that Ganesha was not originally part of the Hindu pantheon, but was a local god who was coopted into the fold later on. In fact, the same is said of his “brothers” Karthikeya and Ayyappa, and it is interesting that all these cooptions happened as sons of Shiva.

Back to Ganesha, the story goes that he is “vighneshwara” not because he removes obstacles (“vighnas”) but because he is the “obstacle god” (direct translation of vighneshwara). The full funda is – the locals who had Ganesha as their god allowed him to become part of the Hindu pantheon (and thus themselves becoming Hindus) under the express condition that he be worshipped in advance of any of the other gods in the Hindu pantheon.

Now, as even most non-practising Hindus will know, pretty much every Hindu ritual starts with a worship of Ganesha. It doesn’t matter which other god you are trying to worship, you always start with a prayer to Ganesha (unless, of course, if you are a radical Vaishnavite – in which case, Ganesha, as a son of Shiva, is taboo).

The polite explanation of this is “Ganesha is such a great god, and a remover of obstacles, you better worship him first so that the rest of your worship goes without obstacles”.

The more realist (and impolite, and controversial) explanation (again I’ve forgotten the source) is that if you started a worship without worshipping Ganesha at first, the locals who had “contributed” him to the pantheon would get pissed off and ransack your worship. And so the Ganesha worship at the beginning of every worship (and invocation ceremony) originally started as a form of blackmail, and then became part of culture. Eventually, it became lip service to Ganesha.

Earlier this year, I was watching the Australian Open. The finals ended, and it was time for the prizes. And at the beginning of the prize distribution, the announcer (Todd Woodbridge) said (paraphrasing) “we begin with a worship to the native peoples of Australia on whose lands we now stand”. It was similar to some episode of Masterchef Australia 2-3 years  back, which again started with the same “invocation”.

OK I actually found the video of Woodbridge from this year:

 

In this particular case, what has happened is that Australia has (finally) learnt about racism, and is now going overboard to identify all forms of overt or covert racism, past and present. The modern Ganesha-worshippers are the people whose job it is to point out every instance of overt or covert racism. If you don’t worship this Ganesha (talking about the “native peoples whose lands we stand upon”), the Ganesha-worshippers will come for you and maybe disturb the rest of your worship.

Ultimately, like the original Ganesha worship, this has turned into lip service.

“Modern Ganeshas” are not restricted to Australia. I just read this hilarious tweet (new Twitter rules means I have to copy paste here):

Have been on college tours in the Northeast. Every admissions officer and student volunteer starts with (1) a declaration of their pronouns, and (2) an acknowledgement of the stolen native lands their college is placed upon.

This is similar modern Ganesha worship, but practiced in the US. Lip service paid so that the “modern Ganesha worshippers” don’t come and disturb your worship.

When Colin Kaepernick knelt down during the playing of the (US) national anthem, he made a powerful statement. But then, when people started randomly taking the knee at the beginning of events (especially immediately after George Floyd’s murder), it turned into “modern Ganesha worship” (lip service so that the worthies don’t get offended).

And no political “wing” or party has a monopoly on modern Ganesha worship. In some places, ceremonies routinely start with praise being conferred on some “dear leader”. Literal Ganesha worship can also help in modern times, since that still has its guardians. You can include recitals of (whichever nation’s) national anthems, or readings from the constitution into this list.

The less memetically fit of these worships will fade away (or burn out, in case of a change in government). The more memetically fit of these worships will remain, but over a period of time turn into Ganesha worship – a token done out of habit and practice rather than due to fear of any contemporary reprisal.

Algo trading and ice cream

I refuse to share ice cream with my daughter, just like I used to refuse to share peanuts with my father. This refusal to share in both cases primarily has to do with the differential speed of consumption.

With my father and peanuts, it was a matter of ability – as someone who had grown up on a peanut farm (and thus he was a fan of Jimmy Carter), he was an expert at shelling peanuts. The Bangalore-born me was much less expert, and so before I knew it he would have finished the lot of it.

With my daughter and ice cream, it is a matter of willingness – she likes to finish it quickly, in big spoons. I like to savour it over a long time – at home,  I use a rather small spoon and eat it slowly. Nowadays I’ve been trying to cut down sugars and so when I eat them I try to get the maximum benefit out of them and thus eat slowly. However, even as a child I would eat my desserts slowly, trying to “extract maximum benefits”.

So last night we were having ice cream (individual small tubs of course). Daughter finished hers quickly and came to me, to see that my tub was still half full (and I was blogging as I was eating it).

“Appa, why do you like to turn your ice cream into milkshake?”, she asked.

“I don’t”, I said, “I just try to get the maximum value out of it, and thus I eat it slowly”.

“But then if you take too long to eat, then it turns into milkshake which is much less enjoyable than ice cream”, she countered. She had a valid point.

And then I realised this is exactly the problem I worked on during my stint as an investment banking quant in 2009-11. I was working on algo trading, specifically execution of large block deals.

The tradeoff there was that if you traded too quickly, you would end up moving the market and thus trading at an unfavourable price. On the other hand, if you traded too slowly, the natural volatility of the stock would mean that the market might move against you. And so you had to balance the two and trade.

I won’t go into the details on how we solved it (my erstwhile bank might not like it), but it suffices to say here that it is similar to eating ice cream.

If you eat too quickly, you run the risk of not getting sufficient “benefit” out of the ice cream at hand. If you eat too slowly, then there is the risk that the ice cream itself will melt and thus be less enjoyable for you.

I tried explaining this analogy to my daughter last night, but she didn’t get it. I guess she is too young to understand risk, volatility, market impact and the like.

And so I’m inflicting this on you!

Bayes Theorem and Respect

Regular readers of this blog will know very well that I keep talking about how everything in life is Bayesian. I may not have said it in those many words, but I keep alluding to it.

For example, when I’m hiring, I find the process to be Bayesian – the CV and the cover letter set a prior (it’s really a distribution, not a point estimate). Then each round of interview (or assignment) gives additional data that UPDATES the prior distribution. The distribution moves around with each round (when there is sufficient mass below a certain cutoff there are no more rounds), until there is enough confidence that the candidate will do well.

In hiring, Bayes theorem can also work against the candidate. Like I remember interviewing this guy with an insanely spectacular CV, so most of the prior mass was to the “right” of the distribution. And then when he got a very basic question so badly wrong, the updation in the distribution was swift and I immediately cut him.

On another note, I’ve argued here about how stereotypes are useful – purely as a Bayesian prior when you have no more information about a person. So you use the limited data you have about them (age, gender, sex, sexuality, colour of skin, colour of hair, education and all that), and the best judgment you can make at that point is by USING this information rather than ignoring it. In other words, you need to stereotype.

However, the moment you get more information, you ought to very quickly update your prior (in other words, the ‘stereotype prior’ needs to be a very wide distribution, irrespective of where it is centred). Else it will be a bad judgment on your part.

In any case, coming to the point of this post, I find that the respect I have for people is also heavily Bayesian (I might have alluded to this while talking about interviewing). Typically, in case of most people, I start with a very high degree of respect. It is actually a fairly narrowly distributed Bayesian prior.

And then as I get more and more information about them, I update this prior. The high starting position means that if they do something spectacular, it moves up only by a little. If they do something spectacularly bad, though, the distribution moves way left.

So I’ve noticed that when there is a fall, the fall is swift. This is again because of the way the maths works – you might have a very small probability of someone being “bad” (left tail). And then when they do something spectacularly bad (well into that tail), there is no option but to update the distribution such that a lot of the mass is now in this tail.

Once that has happened, unless they do several spectacular things, it can become irredeemable. Each time they do something slightly bad, it confirms your prior that they are “bad” (on whatever dimension), and the distribution narrows there. And they become more and more irredeemable.

It’s like “you cannot unsee” the event that took their probability distribution and moved it way left. Soon, the end is near.

 

RG

Last night some colleagues and I were discussing the case of the Titan Submersible. For people who will be reading this after the news cycle has passed, this is basically a submersible that took people to see the debris of the Titanic, and then disappeared.

At the time of discussion, there was reportedly “20 hours of oxygen left” in the vessel, which meant rescue operations had to go on quickly. Then again, I’m writing this 23 hours after our conversation and there is no update yet, so I don’t know what that “20 hours means”.

In any case, someone in the group said “the worst thing that will happen is if someone panics. At that point, the rest of the people will have no option but to just kill this person”. I took a while to figure out what was happening, and then someone mentioned that when you panic, you tend to consume more oxygen.

The “20 hours of oxygen” was at “ground state”, with everyone remaining calm and consuming the average human amount of oxygen. However, if someone panicked, their rate of consumption of oxygen would go much higher, meaning the oxygen reserves will get drawn down much faster, thus lessening the chance of the others to be found.

So, from an expected value basis, it is rational for the rest of the people to kill the panicker, and give themselves a better chance of being found.

There was nobody from my JEE coaching factory in the group, so I didn’t talk about this there, but I got reminded of this story back from 1999 (I wrote JEE in 2000).

Our JEE factory had been making efforts to “imbibe us with fire in the belly”. As one of the teachers in the factory had told us in class, “naavu Kannadigarige aambode mosaranna koTTbiTTre khushhyaagiddbiDtivi” (if someone gives us Kannadigas falafel and curd rice, we’ll live happily forever, and we will forget about working hard).

And so there was this feeling that we need to be taught to be more competitive and ruthless, and part of the factory process involved giving us inspirational lectures to that effect.

“Ning kOpa baralva?” (“don’t you get angry?”), they would ask. They would ask us to imagine something that would make us angry, and then “channel that anger towards cracking JEE”. We needed to have that killer instinct, they would say.

Again, in the context of yesterday’s discussion on the Titan submersible and limited oxygen supplies, I got reminded of yet another of these inspirational speeches from our factory, about the killer instinct.

Remember that this was 1999. The Kargil War had just ended, and was still on everyone’s minds. I’m paraphrasing what one of the teachers said.

“Imagine you are in the army. There is a very good friend with you. You went through the defence academy together, and have always served together. Now you are at war. 

The fight isn’t going very well and you both are hiding somewhere. And then your friend gets hit badly. He is alive but very very badly hurt and can’t move. And he can’t help but groan, and that means there is the risk of giving away your location to the enemy.

So what do you do? You put a bullet in his back and put him out of your misery. Yes, he is your friend. You have both served together for the longest time. But at that moment, you should be willing to shoot him because that is your only chance of survival.”

I don’t know what impact it had on us. The only impact it had on me is that it got etched in my super-normal long term memory. And in a very different, but sort of related context, I remembered it yesterday.

Oh, and when we went to IIT, we found that there was a term for this – “RG”, from “relative grading”. Because grading in most courses was relative, one way of getting better grades was to make sure others performed worse than you (even if you couldn’t perform better).

This took bizarre forms – hiding books in the library so nobody could find them; refusing to share your notes with your classmates; doing much more than required in your course assignments and term papers (this was very very common in my Computer Science class); flattening the tyres of your classmates’ cycles on exam days; teaching others the wrong formulae; and so on.

So in that sense, our factory teachers knew what they were prepping us for!

Hybrid work

I’m in a job that can broadly be described as “hybrid”. The mandate from HR is that we are are “expected to be in office three days a week, and live in the same city as the office”. Nobody really checks how often people go in to office, though I do end up going three times a week on average.

Of late, some tech “gurus” have taken on dunking on hybrid work. DHH of 37signals / Basecamp (I quite like his blog, in general) wrote that “hybrid combines the worst of in-person and remote“. Then, Paul Graham wrote some tweets on remote work. I quite like this one:

Back to hybrid work – I’m in a hybrid role now, where I go into office about three days a week on average, and stay home the other two days (in general, because Monday is crowded with long online meetings, and another day to do some “thinking work”). Different people in my company have different such strategies, and all come into office on their own schedules.

This is not the first time I’m doing “hybrid”. During my rather long independent consulting career, I largely worked from home but travelled to clients’ offices ever so often (once a week if in Bangalore; one week a month if not; on average). It was about getting the best combination of focussed work and collaboration. It worked then, and it works now.

In fact, as far back as 2007 I was in a hybrid office. I was in what is now called a “global capability centre”, and interacting with headquarters in Texas meant being available for calls later in the evening. Consequently, we could work from home a few days a week as long as we were available for these calls.

Coming as it did at the beginning of my career, it was a disaster. I slacked like nobody’s business. Less time spent in office meant less time understanding parts of the business not directly concerned with what I was working on. Most of my development in that period happened due to my independent reading and writing, rather than due to my work.

Now, once again, I’m in a company with “multiple headquarters”. This means that irrespective of where you are, you end up spending a considerable amount of time on video calls with people in other locations. According to DHH, video calls when you are in office is a waste of office time. I agree with him there. The way I manage is through my schedule.

Of course, it helps that I have a reputation in office that I don’t like to do unnecessary meetings – and all matters need to be resolved to the extent possible in text messages or email. This means I spend less time on video calls than many of my colleagues, and when I find a lot of them appearing on a day, i spend that day at home.

Also, I have an unspoken agreement with my (rather small) team on days of the week when we’ll meet in office, and so the technical discussions I find so difficult to have online can be had in person.

Hybrid primarily works because of optionality (a rather underappreciated concept). In my line of business, things can get so technical that there is a limit on the complexity of discussions that can be had online. Similarly, things can get so technical that we need undisturbed alone time to think through some of the solutions.

Hybrid works because it allows for both – it allows you to have your me time for your deep thinking, and the optionality of summoning a teammate to office “tomorrow” for some deep collaboration. The former is unavailable in an all-in office; the latter is not possible if you’re fully remote (I’ve experienced this during the pandemic years).

Yes, hybrid means you need to live within commuting distance of office (sometimes during interviews, I see candidates furiously googling for “richmond circle” or “residency road” when I tell them our office is there. It’s a strong signal that they’re not going to join 😛 ). However, that you only need to commute twice a week (rather than 5 times a week) means you can choose to live a little bit farther.

Yes, it does make hiring harder (compared to all-remote), but once hired, people can be far more productive in a hybrid model. With the option of doing deep work without the danger / fear of someone poking you (this literally happened to me yesterday) when you’re in the middle of deep work!

So yes, put me down as someone who likes the hybrid model of work.

Inverse Endorsements

The main purpose of a brand endorsing an entity – either a person or a team or an event – is so that people who associate themselves with, or simply follow, the latter, will gain awareness of the brand. For example, if you think of “Philips top 10”, every time you think of song countdown shows on prime time TV, you think of Philips.

A lot of times it works. For example, in 2005 (after the Champions League Semis first leg) I started following Liverpool FC. I quickly found that their shirt sponsor was the Danish beer brand Carlsberg. A couple of weeks later, I’d gone for drinks with my then colleagues, and was asked what beer I would have. Having no basis to make my decision (I wasn’t much of a beer drinker then), I went for Carlsberg, which was “my (newfound) team”‘s brand.

This is all basic stuff.

However, sometimes the causation can flow the other way as well. This especially has to do with little-known brands that are largely in the viewers’ minds because of their association with one single entity. Long back I had written about “triangle marketing” – where people will notice an entity if they learn of it from two or more independent sources. In the absence of a second source in which you learn about a company or brand, your only association of it is due to the endorsement, and you start associating the two together.

I started watching the English Premier League sometime in 2006 – before that most of my football watching had been restricted to World Cups and European Championships (and the semis of the 2005 champions league). Since it was from a foreign country (i’d interned in London in 2005 but then chose to take up a job in India post my graduation in 2006), I wasn’t aware of many of the brands who had their logos on the teams’ shirts. And so there was no other way to learn about the brands, and I started instantly associating them.

For example, I’ve never been into running and the likes, so it wasn’t until 2012 or so till I learnt of Garmin as being a very good fitness band. However, I’d seen plenty of the brand in the mid-noughties, on the Middlesbrough jersey.

Even now, when I see Garmin, I first think of Middlesbrough. Because my mind associated these two brands, but not the causal direction. In other words, the mind registers the correlation, not the causation.

Then there is the Indian dairy brand Akshayakalpa. I like their ghee and cheddar, but find their Paneer inferior to Milky Mist. Nevertheless, a few years back I first heard of them when they sponsored this young Indian grandmaster named Nihal Sarin. Now every time I see Akshayakalpa (even when I’m buying their ghee or cheddar, or paneer), I think of Nihal Sarin.

There are many other such examples that I think of from time to time – when I see the sponsoring brand and think of the sponsored brand, but I’m not able to remember those right now, so I’ll stop here.

PS: I remembered now what the other inverse endorsement is. I was watching Ponniyan Selvan 2 (an atrocious movie) last weekend, and saw it was by “Lyca Productions”. My immediate thought was “this is the company behind Lyca Kovai Kings

New blood joins this team

I intended to write this a year ago, when Sadio Mane left Liverpool after six brilliant years at the club. There was much heartbreak among the club fan base about Mane leaving, and a lot of people saw it as a failure on the part of the management and ownership in terms of not being able to keep him.

Now, a year on, I admit that Darwin Nunez hasn’t quite set the club on fire (though I personally quite like him), but as a general principle, this kind of “freshening up” is a highly necessary process in a team, if you need to avoid stagnation.

A month or two back, I was watching some YouTube video on “Liverpool’s greatest Premier League goals against Manchester City” (this was just before the 4-1 hammering at the Etihad). As the goals were shown one by one, I kept trying to guess which season and game it was in.

There were important clues – whether Firmino wore 9 or 11, whether Mane wore 19 or 10, the identity of some players, the length of Trent Alexander Arnold’s hair, my memory of the scoreline from that game, etc. (Liverpool always wear the home Red at the Etihad, so the colour of the away kit wasn’t a clue).

However, for one goal I simply wasn’t able to figure out which season it was. There was TAA wearing 66, Fabinho, Henderson, the fab front three (Firmino-Mane-Salah, wearing 9-10-11 respectively) and Robertson. That’s when it hit me that for a fairly long time, a large part of Liverpool’s team had stayed constant! There was very little change at the club.

Now, there are benefits to having a consistently settled team (as the fabulous 2021-22 season showed), but there is also the danger of stasis. In something like football where careers are short, you don’t want the whole team “getting old together”. In the corporate world, people can get into too much of a comfort zone. And cynicism can set in.

Good new employees are always buzzing with ideas, fearless about what has been rejected before and who thinks how. As people spend longer in the organisation, though, colleagues become predictable and certain ways of doing things become institutionalised. Sooner than you know it, you would have become a “company man”, (figuratively) wearing the same white shirt and blue suits as your fellow company men, and socialising with your colleagues at the (figurative) company club.

There can be different kinds of companies here – some companies allow people to retain a lot of their individuality; and there the “decay” into company-manhood is slower. In this kind of a place, the same set of people can stay together for longer and still continue to innovate and add significant value to one another.

Other companies are less forgiving, and you very quickly assimilate, and lose part of your idiosyncrasy. Insofar as innovation comes out of fresh ideas and thinking and unusual connections, these companies are not very good at it. And in such companies, pretty much the only way to keep the innovative wheel going and continue to add value is by bringing in fresh blood well-at-a-faster-rate.

Putting it another way, if you are a cohesive kind of company, some attrition may not actually be a bad thing (unless you are growing rapidly enough to expand your team rapidly). To grow and innovate, you need people to think different.

And you get there either by having the sort of superior culture where existing employees continue to think different long after they’ve been exposed to one another’s thoughts; or by continuing to bring in fresh employees.

There is no other way.

Round Tables

One of the “features” of being in a job is that you get invited to conferences and “industry events”. I’ve written extensively about one of them in the past – the primary purpose of these events is for people to be able to sell their companies’ products, their services and even themselves (job-hunting) to other attendees.

Now, everyone knows that this is the purpose of these events, but it is one of those things that is hard to admit. “I’m going to this hotel to get pitched to by 20 vendors” is not usually a good enough reason to bunk work. So there is always a “front” – an agenda that makes it seemingly worthy for people to attend these events.

The most common one is to have talks. This can help attract people at two levels. There are some people who won’t attend talks unless they have also been asked to talk, and so they get invited to talk. And then there are others who are happy to just attend and try to get “gyaan”, and they get invited as the audience. The other side of the market soon appears, paying generous dollars to hold the event at a nice venue, and to be able to sell to all the speakers and the audience.

Similarly, you have panel discussions. Organisers in general think this is one level better than talks – instead of the audience being bored by ONE person for half an hour, they are bored by about 4-5 people (and one moderator) for an hour. Again there is the hierarchy here – some people won’t want to attend unless they have been put on the panel. And who gets to be on the panel is a function of how desperate one or more sponsors is to sell to the potential panelists.

The one thing most of these events get right is to have sufficient lunch and tea breaks for people to talk to each other. Then again, these are brilliant times for sponsors to be able to sell their wares to the attendees. And it has the positive externality that people can meet and “network” and talk among themselves – which is the best value you can get out of an event like this one.

However, there is one kind of event that I’ve attended a few times, but I can’t understand how they work. This is the “round table”. It is basically a closed room discussion with a large number of invited “panellists”, where everyone just talks past each other.

Now, at one level I understand this – this is a good way to get a large number of people to sell to without necessarily putting a hierarchy in terms of “speakers” / “panellists” and “audience”. The problem is that what they do with these people is beyond my imagination.

I’ve attended two of these events – one online and one offline. The format is the same. There is a moderator who goes around the table (not necessarily in any particular order), with one question to each participant (the better moderators would have prepared well for this). And then the participant gives a long-winded answer to that question, and the answer is not necessarily addressed at any of the other participants.

The average length of each answer and the number of participants means that each participant gets to speak exactly once. And then it is over.

The online version of this was the most underwhelming event I ever attended – I didn’t remember anything from what anyone spoke, and assumed that the feeling was mutual. I didn’t even bother checking out these people on LinkedIn after the event was over.

The offline version I attended was better in the way that at least we could get to talk to each other after the event. But the event itself was rather boring – I’m pretty sure I bored everyone with my monologue when it was my turn, and I don’t remember anything that anyone else said in this event. The funny thing was – the event wasn’t recorded, and there was hardly anyone from the organising team at the discussion. There existed just no point of all of us talking for so long. It was like people who organise Satyanarayana Poojes to get an excuse to have a party at home.

I’m wondering how this kind of event can be structured better. I fully appreciate the sponsors and their need to sell to the lot of us. And I fully appreciate that it gives  them more bang for the buck to have 20 people of roughly equal standing to sell to – with talks or panels, the “potential high value customers” can be fewer.

However – wouldn’t it be far more profitable to them to be able to spend more time actually talking to the lot of us and selling, rather than getting all of us to waste time talking nonsense to each other? Like – maybe just a party or a “lunch” would be better?

Then again – if you want people to travel inter-city to attend this, a party is not a good enough excuse for people to get their employers to sponsor their time and travel. And so something inane like the “round table” has to be invented.

PS: There is this school of thought that temperatures in offices and events are set at a level that is comfortable for men but not for women. After one recent conference I attended I have a theory on why this is the case. It is because of what is “acceptable formal wear” for men and women.

Western formal wear for men is mostly the suit, which means dressing up in lots of layers, and maybe even constraining your neck with a tie. And when you are wearing so many clothes, the environment better be cool else you’ll be sweating.

For women, however, formal wear need not be so constraining – it is perfectly acceptable to wear sleeveless tops, or dresses, for formal events. And the temperatures required to “air” the suit-wearers can be too cold for women.

At a recent conference I was wearing a thin cotton shirt and could thus empathise with the women.

 

Shrinking deadlines

I’m reminded of this old joke/riddle, which also happened to feature in Gowri Ganesha. “If a 1 metre long sari takes 1 hour to dry in the sun, how long will and 8 metre long sari take to dry?”.

The instinctive answer, of course, is 8 hours, while if you think about it (and assume that you have enough clothesline space to not need to fold), the correct answer is likely to be 1 hour.

Now this riddle is completely unconnected to do with the point of the post, except that both have to do with time.

And then one day you find, ten years have got behind you.
No one told you when to run. You missed the starting gun. 

Ok enough distractions. I’m now home, home again.

Modern workspaces are synonymous with tight deadlines. Even when you give a conservative estimate on how long something will take, you get asked to compress the timelines further. If you protest too much and say that there is a lot to be done, sometimes you might get asked to “put one more person on the job and get it done quickly”.

This might work for routine, or “fighter” jobs – for example, if your job is to enter and copy data for (let’s say) 1000 records, you can easily put another person on the job, and the entire job will be done in about half the time (allowing for a little time for the new person to learn the job and for coordination).

As the job gets more complex, the harder it gets. At one level, there is more time to be spent by the new person coming into the job. Then, as the job gets more complex, it gets harder to divide and conquer, or to “specialise”. This means there is lesser impact to the new person coming in.

And then when you get closer and closer to the stud end of the spectrum, the advantage of putting more people to get the work done faster get lesser and lesser. There comes a point when the extra person actively becomes a liability. Again – I’m reminded of my childhood when occasionally I would ask my mother if she needed help in cooking. “Yes, the best way for you to help is for you to stay out of the kitchen”, she would say.

And then when the job gets really creative, there is a further limit on compression – a lot of the work is done “offline”. I keep telling people about how I finally discovered the proof of Ramsey’s numbers (3,3) while playing table tennis in my hostel, or how I had solved a tough assignment problem while taking a friend’s new motorcycle for a ride.

When you want to solve problems “offline” (to let the insight come to you rather than going hunting for it – I had once written about this) – there is no way to shorten the process. You need to let the problem stew in your head, and hope that some time it will get solved.

There is nothing that can be done here. The more you hurry up, the less the chances you give yourself of solving the problem. Everything needs to take its natural course.

I got reminded of it when we missed a deadline last Friday, and I decided to not think about it through the weekend. And then, an hour before I got to work on Monday, an idea occurred in the shower which fixed the problem. Even if I’d stressed myself (and my team) out on Friday, or done somersaults, the problem would not have been solved.

As I’d said in 2004, quality takes time.

Pre-trained models

On Sunday evening, we were driving to a relative’s place in Mahalakshmi Layout when I almost missed a turn. And then I was about to miss another turn and my wife said “how bad are you with directions? You don’t even know where to turn!”.

“Well, this is your area”, I told her (she grew up in Rajajinagar). “I had very little clue of this part of town till I married you, so it’s no surprise I don’t know how to go to your cousin’s place”.

“But they moved into this house like six months ago, and every time we’ve gone there together. So if I know the route, why can’t you”, she retorted.

This gave me a trigger to go off on a rant on pre-trained models, and I’m going to inflict that on you now.

For a long time, I didn’t understand what the big deal was on pre-trained machine learning models. “If it’s trained on some other data, how will it even work with my data”, I wondered. And then recently I started using GPT4 and other similar large language models. And I started reading blogposts on how with very little finetuning these models can do “gymnastics”.

Having grown up in North Bangalore, my wife has a “pretrained model” of that part of town in her head. This means she has sufficient domain knowledge, even if she doesn’t have any specific knowledge. Now, with a small amount of new specific information (the way to her cousins’s new house, for example), it is easy for her to fit in the specific information to her generic knowledge and get a clear idea on how to get there.

(PS: I’m not at all suggesting that my wife’s intelligence is artificial here)

On the other hand, my domain knowledge of North Bangalore is rather weak, despite having lived there for two years. For the longest time, Mallewaram was a Chakravyuha – I would know how to go there, but not how to get back. Given this lack of domain knowledge, the little information on the way to my wife’s cousin’s new house is not sufficient for me to find my way there.

It is similar with machines. LLMs and other pre-trained models have sufficient “generic domain knowledge” in lots of things, thanks to the large amounts of data they’ve been trained on. As a consequence, if you can train them on fairly small samples of specific data, they are able to generalise around this specific data and learn around them.

More pertinently, in real life, depending upon our “generic domain knowledge” of different domains, the amount of information that you and I will need to learn a certain amount about a certain domain can be very very different.

Everything is context-sensitive!