House husbanding

On Friday I spent my first ever full day as a stay-at-home father. It was rather overwhelming. The daughter is now at an age where she’s learnt to both sit and crawl, and wants to try stand up holding whatever support she can find. And on that very day, she found a fascination for sockets, which are at floor level in our house.

So the morning was spent just making sure she wasn’t trying to reach out into a socket (and one of them is right next to the heater), or hurting herself in other ways. Putting her in the middle of her toys didn’t help – those toys, it seems, aren’t half as interesting as the kitchen floor or the sockets. And so I kept running.

Presently it was time for her breakfast. There’s this Heinz porridge we’ve found which she doesn’t seem to mind, and I tried feeding her that. Midway through her breakfast, she refused to open her mouth, and started crying. It was time for her to sleep, I figured, and put her on my chest. She was soon snoring.

That one time, it wasn’t much of a challenge to transfer her from my chest to her crib (it’s usually an issue, and she cries as soon as I move her away from me). And that little time she slept gave me an opportunity to shit, shower, shave and have my breakfast. Presently she woke up, presenting that cute smile of hers, and it was running all over again.

The second third of the day was the hardest. She was sleepy, and I was supposed to make formula milk for her! And making formula milk is a real bitch, in terms of cleaning the bottles, heating water to the right temperature, etc. I somehow managed it with the background noise of a screaming baby. And then she drank and slept. I was only halfway through making my lunch when she woke up crying (I ultimately ate some old rice with curd for my lunch).

There were many points of time during the day I almost gave up, except that there was no bailout – the wife was far away at work in meetings. I cried a couple of times when the daughter wouldn’t sleep. I sometimes screamed back when she screamed. I nearly went mad.

And then, in the final third, she became normal once again. She’d  rediscovered her toys, and sat in the middle of them, playing. She banged out some jazz tunes on the ancient keyboard we’ve set up on the floor for her. And she felt so happy when I carried her on my shoulders, and demanded that I do it, again, and again, and again!

Making her sit on my shoulders makes her so happy!

A post shared by Karthik S (@skthewimp) on

Finally the wife returned early from work to provide me a bailout, and then cooked dinner for me, and asked me to go out, in order to compensate me for the troubles during the day!

Suddenly, after that day, my respect for the wife shot up, for having taken care of the daughter mostly by herself for the first five months. My contention back then was that she was on “maternity leave” (though she was yet to start work, and though she was running Marriage Broker Auntie then), so it should be okay for her to take care of the baby. My contention had also been that since it was relatively easy for her to feed the baby (no need to prep bottles, heat water, mix formula, etc.), and comfort her, it was okay to take care of the baby alone.

One day of house-husbanding, however, has changed my perspective on this. Babies demand a LOT of attention, and the only way you can do this job well is if you completely give up on doing anything all all of your own in that time (including cooking or eating your meals). And it can be bloody exhausting – though it’s possible that with experience you learn to manage things!

So yes, massive respect now for the wife for having taken care of the baby all by herself for the first five months, when I’d be mostly out either working or meeting people or other such stuff! She is awesome!

The high cost of “relaxing” activities

So I have a problem. I can’t seem to enjoy movies any more. I’ve written about this before. My basic problem is that I end up double-guessing the plots of most movies that I watched (how many storylines are there anyways? According to Kurt Vonnegut, there are six story arcs).

So as I watch movies, I know exactly what is going to happen. And just continuing to watch the movie waiting for that to happen is simply a waste of time – it adds no information content to me.

The result is that I’m extremely selective about the kinds of movies I watch. Some genres, such as Westerns, work because even if the stories may be predictable, the execution and the manner of execution are not, and that makes for interesting watching.

Then, of course, there are directors who have built up a reputation of being “offbeat”, where you can expect that their movies don’t follow expected story arcs – their movies have enough information content to make them worth watching.

And most “classic” movies (take any of the IMDB Top 250, for example) have stories that are told in an extremely compelling fashion – sometimes you might know what happens, but the way things are built up implies that you don’t want to miss watching it happening.

Now, all this is fine, and something I’ve written about before. The point of this post is that while I feel this way about movies, my wife doesn’t feel the same way. She watches pretty much anything, even if the stories are utterly predictable.

For example, she’s watched at least a 100 Telugu movies (though, admittedly, during a particularly jobless stretch in her MBA when she was watching loads of movies, even she got bored of the predictability of Telugu movies and switched to Tamil instead!). She likes to watch endless reruns of 90s Kannada movies that now appear rather lame (to me). She especially loves chick flicks, which I think have excess redundancy built into them for a very specific reason.

I don’t have a problem with any of this! In fact, I’m damn happy that she has a single-player hobby that enables her to keep herself busy when she’s bored. The only little problem I have is that she believes it is romantic to watch movies together. She might sell video for Amazon for a living, but she surely is a fan of “netflix and chill” (more the literal meaning than the euphemistic one).

And that is a problem for me, since I find the vast majority of movies boring and predictable, and she thinks the kind of movies I like are “too serious” and “not suitable for watching together” – an assessment I don’t disagree with (though I did make her watch For a Few Dollars More with me a couple of months back).

I’d prefer to spend our time together not spent in talking doing other activities – reading, for example (reading offers significantly higher throughput than movies, and that, I think, is a result of formats of several lengths being prevalent – newspaper articles, longform articles, books, etc.). I’ve offered to watch movies with her on the condition that I read something at the same time – an offer that has been soundly rejected (and I understand her reasons for that).

And so we reach a deadlock, and it repeats every time when we have time and want to chill. She wants to watch movies together. I initially agree, and then back out when presented with a choice of movies to watch. Sometimes I put myself through it, thoroughly not enjoying the process. Other times, much to her disappointment, we end up not watching.

Clearly there are no winners in this game!

 

 

How power(law)ful is your job?

A long time back I’d written about how different jobs are sigmoidal to different extents – the most fighter jobs, I’d argued, have linear curves – the amount you achieve is proportional to the amount of effort you put in. 

And similarly I’d argued that the studdest jobs have a near vertical line in the middle of the sigmoid – indicating the point when insight happens. 

However what I’d ignored while building that model was that different people can have different working styles – some work like Sri Lanka in 1996 – get off to a blazing start and finish most of the work in the first few days. 

Others work like Pakistan in 1992 – put ned for most of the time and then suddenly finish the job at the last minute. Assuming a sigmoid does injustice to both these strategies since both these curves cannot easily be described using a sigmoidal function. 

So I revise my definition, and in order to do so, I use a concept from the 1992 World Cup – highest scoring overs. Basically take the amount of work you’ve done in each period of time (period can be an hour or day or week or whatever) and sort it in descending order. Take the cumulative sum. 

Now make a plot with an index on the X axis and the cumulative sum on the Y axis. The curve will look like that if a Pareto (80-20) distribution. Now you can estimate the power law exponent, and curves that are steeper in the beginning (greater amount of work done in fewer days) will have a lower power law exponent. 

And this power law exponent can tell you how stud or fighter the job is – the lower the exponent the more stud the job!! 

Newsletter!

So after much deliberation and procrastination, I’ve finally started a newsletter. I call it “the art of data science” and the title should be self-explanatory. It’s pure unbridled opinion (the kind of which usually goes on this blog), except that I only write about one topic there.

I intend to have three sections and then a “chart of the edition” (note how cleverly I’ve named this section to avoid giving much away on the frequency of the newsletter!). This edition, though, I ended up putting too much harikathe, so I restricted to two sections before the chart.

I intend to talk a bit each edition about some philosophical part of dealing with data (this section got a miss this time), a bit on data analysis methods (I went a bit meta on this this time) and a little bit on programming languages (which I used for bitching a bit).

And that I plan to put a “chart of the edition” means I need to read newspapers a lot more, since you are much more likely to find gems (in either direction) there than elsewhere. For the first edition, I picked off a good graph I’d seen on Twitter, and it’s about Hull City!

Anyway, enough of this meta-harikathe. You can read the first edition of the newsletter here. In case you want to get it in your inbox each week/fortnight/whenever I decide to write it, then subscribe here!

And put feedback (by email, not comments here) on what you think of the newsletter!

High dimension and low dimension data science

I’ve observed that there are two broad approaches that people take to getting information out of data. One approach is to simply throw a kitchen sink full of analytical techniques at the data. Without really trying to understand what the data looks like, and what the relationships may be, the analyst simply uses one method after another to try and get insight from the data. Along the way, a “model” will get built.

The other approach (which I’m partial to) involves understanding each variable, and relationship between variables as a first step to getting insight from the data. Here, too, a model might get built, but it will be conditional on the analyst’s view on what kind of a model might suit the data after looking at the data.

Considering that both these approaches are used by large numbers of analysts, it is highly likely that both are legitimate. Then what explains the fact that some analysts use one approach, and others use another? Having thought about it for a long time, I have a hypothesis – it depends on the kind of data being analysed. More precisely, it has to do with the dimensionality of the data.

The first approach (which one might classify as “machine learning”) works well when the data is of high dimensions – where the number of predictors that can be used for predictors is really large, of the order of thousands or larger. For example, even a seemingly low-resolution 32 by 32 pixel image, looked at as a data point, has 1024 dimensions (colour of the image at each of the 1024 pixels is a different dimension).

Moreover, in such situations, it is likely that the signal in the data doesn’t come from one, or, two, or a handful of predictors. In high dimension data science, the signal usually comes from complex interplay of data along various dimensions. And this kind of search is not something humans are fit for – it is best that the machines are left to “learn” the model by themselves, and so you get machine learning.

On the other hand, when the dimensionality of the dataset is low, it is possible (and “easy”) for an analyst to look at the interplay of factors in detail, and understand the data before going on to build the model. Doing so can help the analyst identify patterns in the data that may not be that apparent to a machine, and it is also likely that in such datasets, the signal  lies with data along a small number of dimensions, where relatively simple manipulation will suffice. The low dimensionality also means that complex machine learning techniques are unlikely to contribute much in such cases.

As you might expect, from an organisational perspective, the solution is quite simple – to deploy high-dimension data scientists on high-dimension problems, and likewise with low-dimension data scientists. Since this distinction between high-dimension and low-dimension data scientists isn’t very well known, it’s quite possible that the scientists might be made to work on a problem of dimensionality that is outside of their comfort zone.

So when you have low dimensional data scientists faced with a large number of dimensions of data, you will see them use brute force to try and find signals in bivariate relationships in the data – an approach that will never work since the signal lies in a more complex interplay of dimensions.

On the other hand, when you put high dimension data scientists on a low dimension problem, you will either see them missing out on associations that a human could easily find but a machine might find hard to find, or they might unnecessarily “reduce the problem to a known problem” by generating and importing large amounts of data in order to turn it into a high dimension problem!

PS: I wanted to tweet this today but forgot. Basically, you use logistic regression when you think the signal is an “or” of conditions on some of the underlying variables. On the other hand, if you think the signal is more likely to be an “and” condition of certain conditions, then you should use decision trees!

 

Tiered equity structure and investor conflict

About this time last year, I’d written this article for Mint about optionality in startup valuations. The basic idea there was that any venture capital investment into startups usually comes with “dirty terms” that seek to protect the investor’s capital.

So you have liquidity preferences that demand that the external investors get paid out first (according to a pre-decided formula) in case of a “liquidity event” (such as an IPO or an acquisition). You also have “ratchets”, which seek to protect an investor’s share in the company in case the company raises a subsequent round at a lower valuation.

These “dirty terms” are nothing but put options written by existing investors in a firm in favour of the new investors. And these options telescope. So the Series A round has options written by founders, employees and seed investors, in favour of Series A investors. At the time of Series B, Series A investors move to the short (writing) side of the options, which are written in favour of Series B investors. And so forth.

There are many reasons such clauses exist. One venture capitalist told me that his investors have similar optionality on their investments in his funds, and it is only fair he passes them on. Another told me that “good entrepreneurs” believe in their idea so much that they don’t want to even consider the thought that their company may not do well – which is when these options pay out, and so they are happy to write these options. And then you know that an embedded option can increase the optics of the “headline valuation” of a company, which is something some founders want.

In any case, in my piece for Mint I’d written about such optionality leading to potential conflicts among investors in different classes of stock, which might sometimes be a hindrance to further capital raises. Quoting from there,

The latest round of investors usually don’t mind a “down round” (an investment round that values the company lower than the preceding round) since their ratchets protect them, but earlier investors are short such ratchets, and don’t want to see their stakes diluted. Thus, when a company is unable to find investors who are willing to meet its current round of valuation, it can lead to conflict between different sets of investors in the company itself.

And now Mint reports that such conflicts are a main reason for Indian e-commerce biggie Snapdeal’s recent struggles, which has led to massive layoffs and a delay in funding. The story has played out exactly as I’d written in the paper last year.

Softbank, which invested last in Snapdeal and is long put options on the company’s value, is pushing the company to raise more funds at a lower valuation. However, Nexus and Kalaari, who had invested earlier and stand to lose significantly thanks to these options, are resisting such moves. And the company continues to stall.

I hope this story provides entrepreneurs and venture capitalists sufficient evidence that dirty terms can affect everyone up and down the chain, and can actually harm the business’s day-to-day operations. The cleaner a company keeps the liabilities side of the balance sheet (in having a small number of classes of equity), the better it is in the long run.

But then with Snap having IPOd by offering only non-voting shares to the public, I’m not too hopeful of equity truly being equitable any more!

Explaining the lack of dishwashers in India

For the last four weeks, after landing in Britain, we’ve been using the dishwasher fairly regularly. On an average, we run it once a day, and the vessels come out of it nice and shiny – to an extent that is nearly impossible when you wash them by hand. Last year when we were in Spain, too, we used the dishwasher fairly often.

Considering the convenience (all your dishes done in one go, and coming out nice and shiny), I’ve been wondering why the dishwasher hasn’t taken off in India. The requirement for water and electricity doesn’t explain it – the near-ubiquity of the washing machine in upper middle class households suggests that is not that much of a problem. It’s not a function of our using steel plates, either – if that were the only constraint, people would have switched plates to get the benefit of this convenience.

The real answer lies in the archaic concept of the enjil (saliva; known as jooTa in Hindi), and theories on how saliva can get transmitted and contaminate stuff. To be fair, it’s a useful concept in a way that it doesn’t allow anyone’s germ-bearing saliva to contaminate things around them, except for roads and sidewalks that is! Specifically, the enjil concept ensures that food doesn’t get remotely contaminated by someone’s saliva. But it takes things a bit too far.

For example, sharing plates, even when you’re using separate spoons (let’s saw when sharing dessert at a restaurant), is taboo. When you double-dip your spoon into the plate, germs from your saliva get transmitted there, and can potentially contaminate people you are sharing your food with. Or so the theory goes. The exceptions are in childhood, where a child is allowed to share plates with the mother, and after marriage, when couples are allowed to share plates! Go figure how that works.

Similarly, traditional Indians eschew the dining table, and the concept of keeping serving bowls on the same surface as plates. Again, the concept is that saliva can somehow “transmit” from the plates to the serving bowls and contaminate everyone’s food.

Next, there is an elaborate protocol to deal with used plates. They are not supposed to be washed in the same sink as other vessels. Yes, you read that right. When I was growing up, the protocol for used plates was to first rinse them in the bathroom (after throwing leftover food in the dustbin) before dropping them in the sink. It didn’t matter how well you rinsed the plate in the bathroom – that water had fallen on it after your usage would indicate that it was now purified, and fit to sit with all the other unwashed vessels.

Now consider the dishwasher. To achieve economies of scale at the household level, and to ensure vessels don’t pile up, you put all kinds of vessels in it at the same time – plates, spoons, forks, serving bowls and  cooking vessels! In other words, “saliva-bearing” dishes are put into the same contraption at the same time as “saliva-free” cooking dishes, and the “same water” is used to wash all of them together.

And that clearly violates all prudent practices of saliva management and contamination avoidance that we have all grown up with! And trust me, it takes time to get over such instinctive practices one has grown up with. And so I predict that it will at least be another generation (20 years or so) when there are sufficient households with adults who grew up without a strong concept of enjil, and who might be willing to give the dishwasher a try!