The popularity of nicknames and political correctness

It is a rite of passage in an institution such as IIT (Indian Institute of Technology) that a first year student be given a potentially embarrassing nickname following “interaction” with senior students. The profundity of these nicknames varies significantly, with some people simply being given names that correspond to body parts in different languages, which others have more involved names.

Based on a conversation yesterday, the hypothesis is that more profound nicknames which are embarrassing only in a particular context are more likely to propagate, and thus stick, while the more crass names are likely to die out more easily.

The logic is simple – the crass names (a few examples being “lund”, “condom” and “dildo” – there is at least one person with each of these names in every hostel of every batch at IIT Madras) are potentially embarrassing for an “outsider” to use, and to be used in public. So when the bearer of such a name graduates and moves on to a new setting, the new people he encounters make a prudent choice to not use the embarrassing word, and the nickname dies a quick death.

When the nickname is embarrassing or derogatory for more contextual reasons, though, the name quickly loses its context and becomes incredibly simple for people to use. Take my own name “Wimpy”, for example – not too many people know it has an embarrassing origin, and it is a perfectly respectable word to shout out in public, or even in an office setting. And so it has propagated – in at least two offices I worked in, everyone called me “Wimpy”.

It is similar for lots of other “benign” names. But it is unlikely that a name like “condom” or “dildo” will propagate, and it is in fact more likely that even the people who bestowed such names upon the unsuspecting will stop using them once everyone graduates and moves on to a more formal environment.

There are exceptions, of course, a notable one being “Baada“. It is a cuss-word representing a body part, except that it is in a non-standard (though not small by any means) language, but everyone I know calls Baada Baada. He used to be my colleague, and people at work also called him Baada. It is unlikely that his nickname would’ve propagated, though, had it been the synonym in English or Hindi.

Thanks to Katpadi Katsa for discussions leading up to this post. In a future post, I’ll talk about models for propagation of nicknames across institutions.



Diversity and sorting by last name

So the wife graduated today. The graduation ceremony was in threes – three graduates were called at a time and presented their degrees (the wife now claims that she has one more degree than me, since my B-school gave me a Post Graduate Diploma and not a Masters).

It was reminiscent of swearing in of Ministers of State in India, who take oath four at a time. My graduation ceremonies, where we collected our degrees one at a time, was more like the swearing in of Cabinet Ministers. This simultaneous award of degrees worked well in finishing the ceremony in good time, though.

As is usual in such ceremonies, the graduates had been sorted by name. Except that since this is a global business school, the sorting was done by <Last Name> followed by <First Name> (at all my schools, sorting has been in the opposite order).

This related to fairly hilarious bunching of graduates from different countries at the same point in time. One batch of three was a set of three Lee’s, for example (rather amazingly, there was not a single Wang in the graduating class). They were followed by two more Lee’s/Li’s. Another set of three were three Japanese who had the same prefix to the last name.

And the wife was one of three Indians in the batch whose last name started with “Bha-“. It’s a rather unique Indian construct, and the three¬†were listed consecutively for graduation. It was only because of a “cut” that occurred in the middle that the three didn’t go simultaneously to receive their degrees.

Different countries have different name forms and the same words might occur as a prefix of a large number of last names from the country. Such prefixes might also be unique to certain countries, thanks to which sorting by last name results in the occurrence of several “country clusters” through the course of the list.

It got me wondering if the diversity of the batch (more than 50 countries were represented in the graduating class of ~300) mgiht have been exhibited better, and people of the same nationality been spread apart more widely through the list had they done (what is to us Indians) the conventional thing and sorted by first name instead!

What is the feminine of Amit?

“Amit” is a word that is commonly used, often pejoratively, to refer to men from the North of India. The reason for the usage of “Amit” in this context is that while it is an extremely common name for men from North India, it is not as common in other parts of India, and thus it characterises men from North India.

A question that has been floating around in social media circles for a long time in this connection is what the feminine form of “Amit” is. If Amit characterises the median North Indian male, what name characterises the median North Indian female? Popular candidates for this are Neha, Isha and Pooja. Pooja suffers from the fact that is is also a fairly common name in other parts of India. Isha, while it might be strongly North Indian, is too obscure. And for some reason, people are loathe to accept Neha as the feminine Amit. So how do we resolve this?

I, being a stud, am a big follower of the Hanuman principle. If you have to solve a problem, and it takes no more effort to solve a generic problem, then solve the generic problem and apply it to this problem as a special instance rather than spending time to solve each instance. Hence, we will rephrase this problem as “What first name uniquely identifies a particular ethnicity?”. I, being a quant, am going to use the quantitative hammer to hammer down this nail. So we can rephrase as “how can we quantitatively characterise ethnicities by first names?”

The first thing to notice is that we need a frame of reference. Amit is a good name to characterise a North Indian man among the universe of Indian men. However, if we define the universe differently, as “Asian” for example, or “men living in Delhi”, Amit may not be as characteristic at all. Hence, any formula that we develop needs to take into account the frame of reference.

Secondly, what makes a name ethnically characteristic? I argue that there are two factors, and these two will be used in deriving the final formula. Firstly, the name should be common among the particular ethnicity – for example, Murugaselvan is extremely characteristic of Tamil men, but its occurrence is so low that using Murugaselvan as the median Tamil man among all Indian men is futile. Secondly, the name should be distinctive for that particular community. For example, a possible competitor to Amit is Rahul, a name that is possibly as common among North Indians as Amit is (I haven’t seen the statistics). The problem with Rahul, however, is that it is a fairly common name in South India also! So it does a bad job in terms of discrimination. So basically what we are looking for is a name that is both popular in the ethnicity we want to characterise, and also characteristic to that particular ethnicity in comparison to the universe.

These two requirements lead to the following rather simple formula (I’m not claiming that this is the best formula – if there is a way to objectively evaluate such formulas, that is – but it is sufficiently good and simple to understand and evaluate). Let our universe by U and the community we are trying to characterise by C. C’ is {U – C} (I’m assuming all of you know set theoretic notation). The first name N that characterises the community C is the one that maximises P(N|C) – P(N|C’). That’s it. Simple.

To explain in English, for each first name, we calculate the incidence of that particular name in the community C. That is, for example, what proportion of North Indian girls are named Neha, Pooja, Isha, Nidhi, etc. Next, we calculate the incidence of the name in the “complement of C”, that is how likely is it that someone in the rest of the “universe” we have defined has the same name. In our above example, we calculate what proportion of Indian but NOT North Indian girls (taking Indian women as the universe) are named Neha, Pooja, Isha, Nidhi, etc. Then, for each name, we subtract the latter quantity from the former quantity and then select the name for which this difference is maximum! Rather simple, I would think!

Now, we need data. Unfortunately I can’t seem to find any publicly available data sets that contain long lists of names along with markers of ethnicity (address or city or state or language preference or some such). If you can help me with some data sets, we can actually run the above formula for different ethnicities and characterise them. It is going to be a fun exercise, I promise! So pour in the data. And I request you to share publicly available data and not proprietary data.

And then we can for once and for all finish this debate of what the feminine form of Amit is, along with many other fun ethnic classifications.