Smashing the Law of Conservation of H

A decade and half ago, Ravikiran Rao came up with what he called the “law of conservation of H“. The concept has to do with the South Indian practice of adding a “H” to denote a soft consonant, a practice not shared by North Indians (Karthik instead of Kartik for example). This practice, Ravikiran claims, is balanced by the “South Indian” practice of using “S” instead of “Sh”, because of which the number of Hs in a name is conserved.

Ravikiran writes:

The Law of conservation of H states that the total number of H’s in the universe will be conserved. So the extra H’s that are added when Southies have to write names like Sunitha and Savitha are taken from the words Sasi and Sri Sri Ravisankar, thus maintaining a balance in the language.

Using data from the Bangalore first names data set (warning: very large file), it is clear that this theory doesn’t hold water, in Bangalore at least. For what the data shows is that not only do Bangaloreans love the “th” and “dh” for the soft T and D, they also use “sh” to mean “sh” rather than use “s” instead.

The most commonly cited examples of LoCoH are Swetha/Shweta and Sruthi/Shruti. In both cases, the former is the supposed “South Indian” spelling (with th for the soft T, and S instead of sh), while the latter is the “North Indian” spelling. As it turns out, in Bangalore, both these combinations are rather unpopular. Instead, it seems like if Bangaloreans can add a H to their name, they do. This table shows the number of people in Bangalore with different spellings for Shwetha and Shruthi (now I’m using the dominant Bangalorean spellings).

As you can see, Shwetha and Shruthi are miles ahead of any of the alternate ways in which the names can be spelt. And this heavy usage of H can be attributed to the way Kannada incorporates both Sanskrit and Dravidian history.

Kannada has a pretty large vocabulary of consonants. Every consonant has both the aspirated and unaspirated version, and voiced and unvoiced. There are three different S sounds (compared to Tamil which has none) and two Ls. And we need a way to transliterate each of them when writing in English. And while capitalising letters in the middle of a word (as per Harvard Kyoto convention) is not common practice, standard transliteration tries to differentiate as much as possible.

And so, since aspirated Tha and Dha aren’t that common in Kannada (except in the “Tha-Tha” symbols used by non-Kannadigas to show raised eyes), th and dh are used for the dental letters. And since Sh exists (and in two forms), there is no reason to substitute it with S (unlike Tamil). And so we have H everywhere.

Now, lest you were to think that I’m using just two names (Shwetha and Shruthi) to make my point, I dug through the names dataset to see how often names with interchangeable T and Th, and names with interchangeable S and Sh, appear in the Bangalore dataset. Here is a sample of both:

There are 13002 Karthiks registered to vote in Bangalore, but only 213 Kartiks. There are a hundred times as many Lathas as Latas. Shobha is far more common than Sobha, and Chandrashekhar much more common than Chandrasekhar.

 

So while other South Indians might conserve H, by not using them with S to compensate for using it with T and D, it doesn’t apply to Bangalore. Thinking about it, I wonder how a Kannadiga (Ravikiran) came up with this theory. Perhaps the fact that he has never lived in Karnataka explains it.

The Comeback of Lakshmi

A few months back I stumbled upon this dataset of all voters registered in Bangalore. A quick scraping script followed by a run later, I had the names and addresses and voter IDs of all voters registered to vote in Bangalore in the state assembly elections held this way.

As you can imagine, this is a fantastic dataset on which we can do the proverbial “gymnastics”. To start with, I’m using it to analyse names in the city, something like what Hariba did with Delhi names. I’ll start by looking at the most common names, and by age.

Now, extracting first names from a dataset of mostly south indian names, since South Indians are quite likely to use initials, and place them before their given names (for example, when in India, I most commonly write my name as “S Karthik”). I decided to treat all words of length 1 or 2 as initials (thus missing out on the “Om”s), and assume that the first word in the name of length 3 or greater is the given name (again ignoring those who put their family names first, or those that have expanded initials in the voter set).

The most common male first name in Bangalore, not surprisingly, is Mohammed, borne by 1.5% of all male registered voters in the city. This is followed by Syed, Venkatesh, Ramesh and Suresh. You might be surprised that Manjunath doesn’t make the list. This is a quirk of the way I’ve analysed the data – I’ve taken spellings as given and not tried to group names by alternate spellings.

And as it happens, Manjunatha is in sixth place, while Manjunath is in 8th, and if we were to consider the two as the same name, they would comfortably outnumber the Mohammeds! So the “Uber driver Manjunath(a)” stereotype is fairly well-founded.

Coming to the women, the most common name is Lakshmi, with about 1.55% of all women registered to vote having that name. Lakshmi is closely followed by Manjula (1.5%), with Geetha, Lakshmamma and Jayamma coming some way behind (all less than 1%) but taking the next three spots.

Where it gets interesting is if we were to look at the most common first name by age – see these tables.

 

 

 

 

 

 

Among men, it’s interesting to note that among the younger age group (18-39, with exception of 35) and older age group (57+), Muslim names are the most common, while the intermediate range of 40-56 seeing Hindu names such as Venkatesh and Ramesh dominating (if we assume Manjunath and Manjunatha are the same, the combined name comes top in the entire 26-42 age group).

I find the pattern of most common women’s names more interesting. It is interesting to note that the -amma suffix seems to have been done away with over the years (suffixes will be analysed in a separate post), with Lakshmamma turning into Lakshmi, for example.

It is also interesting to note that for a long period of time (women currently aged 30-43), Lakshmi went out of fashion, with Manjula taking over as the most common name! And then the trend reversed, as we see that the most common name among 24-29 year old women in Lakshmi again! And that seems to have gone out of fashion once again, with “modern names” such as Divya, Kavya and Pooja taking over! Check out these graphs to see the trends.

(I’ve assumed Manjunath and Manjunatha are the same for this graph)

So what explains Manjunath and Manjula being so incredibly popular in a certain age range, but quickly falling away on both sides? Maybe there was a lot of fog (manju) over Bangalore for a few years? 😛

The popularity of nicknames and political correctness

It is a rite of passage in an institution such as IIT (Indian Institute of Technology) that a first year student be given a potentially embarrassing nickname following “interaction” with senior students. The profundity of these nicknames varies significantly, with some people simply being given names that correspond to body parts in different languages, which others have more involved names.

Based on a conversation yesterday, the hypothesis is that more profound nicknames which are embarrassing only in a particular context are more likely to propagate, and thus stick, while the more crass names are likely to die out more easily.

The logic is simple – the crass names (a few examples being “lund”, “condom” and “dildo” – there is at least one person with each of these names in every hostel of every batch at IIT Madras) are potentially embarrassing for an “outsider” to use, and to be used in public. So when the bearer of such a name graduates and moves on to a new setting, the new people he encounters make a prudent choice to not use the embarrassing word, and the nickname dies a quick death.

When the nickname is embarrassing or derogatory for more contextual reasons, though, the name quickly loses its context and becomes incredibly simple for people to use. Take my own name “Wimpy”, for example – not too many people know it has an embarrassing origin, and it is a perfectly respectable word to shout out in public, or even in an office setting. And so it has propagated – in at least two offices I worked in, everyone called me “Wimpy”.

It is similar for lots of other “benign” names. But it is unlikely that a name like “condom” or “dildo” will propagate, and it is in fact more likely that even the people who bestowed such names upon the unsuspecting will stop using them once everyone graduates and moves on to a more formal environment.

There are exceptions, of course, a notable one being “Baada“. It is a cuss-word representing a body part, except that it is in a non-standard (though not small by any means) language, but everyone I know calls Baada Baada. He used to be my colleague, and people at work also called him Baada. It is unlikely that his nickname would’ve propagated, though, had it been the synonym in English or Hindi.

Thanks to Katpadi Katsa for discussions leading up to this post. In a future post, I’ll talk about models for propagation of nicknames across institutions.

 

 

Python and Hindi

So I’ve recently discovered that using Python to analyse data is, to me, like talking in Hindi. Let me explain.

Back in 2008-9 I lived in Delhi, where the only language spoken was Hindi. Now, while I’ve learnt Hindi formally in school (I got 90 out of 100 in my 10th boards!), and watched plenty of Hindi movies, I’ve never been particularly fluent in the language.

The basic problem is that I don’t know the language well enough to think in it. So when I’m talking Hindi, I usually think in Kannada and then translate my thoughts. This means my speech is slow – even Atal Behari Vajpayee can speak Hindi faster than me.

More importantly, thinking in Kannada and translating means that I can get several idioms wrong (can’t think of particular examples now). And I end up using the language in ways that native speakers don’t (again can’t think of examples here).

I recently realised it’s the same with programming languages. For some 7 years now I’ve mostly used R for data analysis, and have grown super comfortable with it. However, at work nowadays I’m required to use Python for my analysis, to ensure consistency with the rest of the firm.

While I’ve grown reasonably comfortable with using Python over the last few months, I realise that I have the same Hindi problem. I simply can’t think in Python. Any analysis I need to do, I think about it in R terms, and then mentally translate the code before performing it in Python.

This results in several inefficiencies. Firstly, the two languages are constructed differently and optimised for different things. When I think in one language and mentally translate the code to the other, I’m exploiting the efficiencies of the thinking language rather than the efficiencies of the coding language.

Then, the translation process itself can be ugly. What might be one line of code in R can sometimes take 15 lines in Python (and vice versa). So I end up writing insanely verbose code that is hard to read.

Such code also looks ugly – a “native user” of the language finds it rather funnily written, and will find it hard to read.

A decade ago, after a year of struggling in Delhi, I packed my bags and moved back to Bangalore, where I could both think and speak in Kannada. Wonder what this implies in a programming context!

Censoring the death ceremony

So we finally watched Raam Reddy’s much-acclaimed Thithi today. Ever since we’d watched the trailer, we’d wanted to see the movie, and though reviews from relatives and friends were mixed, they helped set our expectations and we had a good time at the movie.

This post, however, is not about the movie, but about censorship. We watched at PVR Forum, and immediately after the U/A certificate (and before the movie) came a certificate with the cuts that the censor board had recommended. Even before the movie began, we knew that four instances of thika (arse) and one instance of bOLi (bitch) had been muted.

I think this is a fantastic idea – while the censor board is happy to use its scissors liberally, showing how they’ve used their scissors beforehand helps set viewers’ expectations, so that they know exactly what they’ve missed out. My only contention is that that slide should be shown for longer than it was, so that viewers get a better idea.

Anyway, once the movie started, it was clear that the censors had done a shoddy job. As a friend (who watched the movie yesterday) pointed out, the word “tuNNe” (dick) wasn’t muted out. I noticed during the movie that there is a dialogue that is translated (and subtitled) as “screw your mother” remained.

(while I initially wondered why a Kannada movie was being shown in Bangalore with English subtitles, I realised once the movie started that it was a good thing. The language used in the movie was quite different from what we normally speak in Bangalore.)

What the censorship of words in this movie goes to illustrate is that the censor board is thoroughly incompetent. Whether censorship is necessary is a philosophical question, and the government has appointed a committee to look into that. What is more important is that the people at the censor board are thoroughly incompetent, and hopefully that will be taken into account when the censorship policy is finally revised!

thika is something every Kannadiga kid uses liberally (though bOLi is something we graduate to only in teens), while tuNNe and nin-amman (translated as “screw your mother”) are normally not used in polite conversation. The censor board is absolutely clueless!

Gandhi

I was playing table tennis in my hostel at IIT with a friend who came from North India. At some point during a rally, the ball hit the edge of the table on his side, and moved far away, giving me the point. I apologised (when you normally do when you win a point by fluke), and said “Gandhi”. He didn’t understand what that meant.

It was then that I realised that using the word “Gandhi” as a euphemism for “fluke” is mostly a Bangalore thing. Back when I played table tennis during my school days, a let was called “Gandhi”, as was a ball hitting the edge of the table. It was the same case with comparable sports such as badminton or tennis or even volleyball. A basket that went in by fluke in basketball was also “Gandhi”.

Now, it might be hard for people to reconcile flukes with MK Gandhi, who was assassinated sixty eight years ago. Some people might also find it repugnant – that the great Mahatma’s name might be used to describe flukes. Looking at it as a fluke, however, is a shallow interpretation.

While it is hard to compare Gandhi (the person) with flukes, it is not hard at all to look at him as a figure of benevolence. He was known for his non-violent methods, and for turning the proverbial “other cheek”. He pioneered the use of non-cooperation as a method of protest (which has unfortunately far outlived its utility) and showed that you could win by being extremely nice. This was channelled in a movie a decade ago which spoke about “Gandhigiri” as a strategy for world domination.

So when the table tennis ball hits the edge of the table and flies off, invoking Gandhi’s name is a sign of benevolence by the person who has lost the point, who implicitly says “you, bugger, didn’t deserve to win this point. But I’ll be benevolent like Gandhi and allow you to take it”. It is similar in other sporting contexts, such as a let or a freak basket.

The invocation of Gandhi’s name as a sign of benevolence is common in other fields as well. In 1991, my cousin had to miss her second standard annual exams as she had to fly to Bangalore on account of the death of the grandfather we shared. Her school, in an act of benevolence, promoted her anyway, an act that was described by other relatives in Bangalore as “Gandhi pass”.

If there is a Gandhi pass, there is a Gandhi class also (again I was surprised to know it’s not a thing in North India). Another of Gandhi’s defining characteristics was the simplicity of his life. Though he could afford to travel better, he would always travel third class, which had the cheapest ticket. As a consequence, the cheapest ticket came to be known as the “Gandhi class”.

The term (Gandhi class) is now most commonly used in the context of cinemas, referring to the front few rows for which tickets are the cheapest. Even though multiplexes have larger blocks nowadays, which means front row tickets are no cheaper than those a few rows behind, the nomenclature sticks. If you are unlucky enough to only get a seat in the first couple of rows, you proudly say you are in “Gandhi class”.

That his name has come to be associated with so many everyday occurrences, mostly in irreverence, illustrates the impact Gandhi has had. Some people might outrage (as the fashion is nowadays) about the irreverence, and “reduction” of Gandhi to these concepts.

I’m still surprised, though, that things like “Gandhi class”, “Gandhi pass” and “Gandhi” as a euphemism for fluke weren’t that prevalent in North India fifteen years ago.

English and phonetic spellings

So my nephew Samvit, who recently turned 4, has learnt to spell. And he has learnt to use a computer (and phone) keyboard. He seems to love the keyboard so much that he apparently refuses to write using pen and paper.

They say that he’s taken after me in many ways (despite us sharing just 1/16 of our genes – he’s my cousin’s son), and I must mention that my writing output exploded after I had learnt to type and got access to a computer keyboard.

The point of this post is not about his writing, however. Yesterday, they made him spell out a few words, and here is how he spelt them out. One thing I might want to disown him for is that he uses all caps. Leaving that aside, the way he spells is extremely interesting. Here is the list of words he spelt out, as emailed to me by his mother:

THIORI
ELEKTRIC
MAGNET
ANTENA
MYKRO
STRIP
HELIX
PERABOLA
DYPOLE
HORN
GYD
COSMIK
PLANET
ANIMUL
DANS
SING
CUK
DRAMA
MUZIC
HOUS
TEMPUL
SOUND
SOFA
WATUR
AEROPLEN
SHIPYARD
GARDUN
CHOKLET
BRED
JUS
BANANA
ORENJ
AVACADO
ORIYO

As you might notice, it’s all very phonetic. He has learnt the English alphabet, and sounds associated with each letter, and then tried to fit that to the words that he has had to type out. It appears weird at first, but then if you take a closer look, you realise that it’s rather intuitive.

He seems to have figured out the polymorphism behind certain letters, for he uses multiple sounds of U in “Jus” (which is how I think it’s spelt in certain European languages, btw) and in “Gardun”. He hasn’t figured out the polymorphism in i-y though, as he says “thiori” and “gyd”.

Then his use of Cs and Ks for the Ka sound is also interesting, as he uses both of them, and he seems to have a certain logic for using them. I’ve been trying to reverse engineer this logic but so far failed. He says “cuk” and “cosmik”, from which you might think he uses “c” when its the beginning of a syllable and “k” when it’s the end of a syllable.

But then you also notice that he says “elektric” which throws this hypothesis out of the window. And there is “avacado” and “choklet”.

Overall, though, it is fascinating to see how a four-year-old who has just learnt the language spells, Maybe if we get a bunch of four-year-olds who still haven’t been formally taught to spell to spell, we might understand what English spelling should intuitively be like! It might even be possible that going forward the language may evolve to this new spelling!

Are there any other interesting patterns you notice in the other list of words? Are there any other interesting ways in which you’ve seen other kids spell? What does this mean for the English language – should it be simplified?

Ghoti