Shouting, Jumping and Peacock Feathers

The daughter has been ill for nearly the last two weeks, struck by one bacterium after one virus, with a short gap in between. Through her first illness (a stomach bug), she had remained cheerful and happy. And when I had taken her to hospital, she had responded by trying to climb up an abacus they had placed there in the children’s urgent care room.

So when the virus passed and she recovered, the transition was a rather smooth one. The day after she recovered I took her to the park where she jumped and ran around and rode the swing and the slide. Within a day or two after that she was eating normally, and we thought she had recovered.

Only for a bacterium to hit her and lay her low with a throat infection and fever. Perhaps being a stronger creature than the earlier virus, or maybe because it was the second illness in the space of a week, this one really laid her low. She quickly became weak, and rather than responding to “how are you?” with her usual cheerful “I’m good!!”, she started responding with a weak “I’m tired”. As the infection grew worse, she stopped eating, which made her weaker and her fever worse. Ultimately, a trip to the doctor and a course of antibiotics was necessary.

It was only yesterday that she started eating without a fuss (evidently, the antibiotic had started to do its work), and when she made a real fuss about eating her curd rice last night, I was deeply sceptical about how she would get on at her nursery today.

As it happened, she was completely fine, and had eaten all her meals at the nursery in full. And when I got her home in the evening, it seemed like she was fully alright.

She is normally a mildly naughty and loud kid, but today she seemed to make an extra effort in monkeying around. She discovered a new game of jumping off the edge of the sofa on to a pillow placed alongside – a sort of dangerous one that kept us on the edge of our seats. And periodically she would run around quickly and scream at the top of her voice.

To me, this was like a peacock’s feathers – by wasting her energy in unnecessary activities such as jumping and screaming, the daughter was (I think) trying to signal that she had completely recovered from her illness, and that she now had excess energy that she could expend in useless activities.

The upside of all this monkeying around was that soon after I had helped her get through 2-3 books post her dinner, she declared that it was “taachi (sleep) time”, and soon enough was fast asleep. This is significant in that the last few days when she spent all the time at home, her sleep schedule had gotten ruined.

Bangalore names are getting shorter

The Bangalore Names Dataset, derived from the Bangalore Voter Rolls (cleaned version here), validates a hypothesis that a lot of people had – that given names in Bangalore are becoming shorter. From an average of 9 letters in the name for a male aged around 80, the length of the name comes down to 6.5 letters for a 20 year old male. 

What is interesting from the graph (click through for a larger version) is the difference in lengths of male and female names – notice the crossover around the age 25 or so. At some point in time, men’s names continue to become shorter while women’s names’ lengths stagnate.

So how are names becoming shorter? For one, honorific endings such as -appa, -amma, -anna, -aiah and -akka are becoming increasingly less common. Someone named “Krishnappa” (the most common name with the ‘appa’ suffix) in Bangalore is on average 56 years old, while someone named Krishna (the same name without the suffix) is on average only 44 years old. Similarly, the average age of people named Lakshmamma is 55, while that of everyone named Lakshmi is just 40.  while the average Lakshmi (same name no suffix) is just 40.

In fact, if we look at the top 12 male and female names with a honorific ending, the average age of the version without the ending is lower than that of the version with the ending. I’ve even graphed some of the distributions to illustrate this.

  In each case, the red line shows the distribution of the longer version of the name, and the blue line the distribution of the shorter version

In one of the posts yesterday, we looked at the most typical names by age in Bangalore. What happens when we flip the question? Can we define what are the “oldest” and “youngest” names? Can we define these based on the average age of people who hold that name? In order to rule out fads, let’s stick to names that are held by at least 10000 people each.

These graphs are candidates for my own Bad Visualisations Tumblr, but I couldn’t think of a better way to represent the data. These graphs show the most popular male and female names, with the average age of a voter with that given name on the X axis, and the number of voters with that name on the Y axis. The information is all in the X axis – the Y axis is there just so that names don’t overlap.

So Karthik is among the youngest names among men, with an average age among voters being about 28 (remember this is not the average age of all Karthiks in Bangalore – those aged below 18 or otherwise not eligible to vote have been excluded). On the women’s side, Divya, Pavithra and Ramya are among the “youngest names”.

At the other end, you can see all the -appas and -ammas. The “oldest male name” is Krishnappa, with an average age 56. And then you have Krishnamurthy and Narayana, which don’t have the -appa suffix but represent an old population anyway (the other -appa names just don’t clear the 10000 people cutoff).

More women’s names with the -amma suffix clear the 10000 names cutoff, and we can see that pretty much all women’s names with an average age of 50 and above have that suffix. And the “oldest female name”, subject to 10000 people having that name, is Muniyamma. And then you have Sarojamma and Jayamma and Lakshmamma. And a lot of other ammas.

What will be the oldest and youngest names we relax the popularity cutoff, and instead look at names with at least 1000 people? The five youngest names are Dhanush, Prajwal, Harshitha, Tejas and Rakshitha, all with an average age (among voters) less than 24. The five oldest names are Papamma, Kannamma, Munivenkatappa, Seethamma and Ramaiah.

This should give another indication of where names are headed in Bangalore!

Reading Boards

Today was a landmark day in the life of the daughter. She looked at a bus this evening, and without any prompting, started trying to read the number on it.

Most of today hadn’t been that great for her. She’s been battling a throat infection for a few days now, and has been largely unable to eat for the last couple of days because of which she had developed high fever today. As a result, we took her to hospital today, and it was on the way back from there that the landmark event happened.

Having got on to the bus at the starting point, we had the choice of seat, and obviously chose the best seat in the house – the seat right above the driver (I’m going to miss double decker buses when we move out of London). She was excited to be in a bus – every day on the way to her nursery, we pass by many buses, prompting her to exclaim “red bus!!” and expressing a desire to ride them. The nursery is five minutes walk away from home, so no such opportunity arises.

I must also mention that we live at a busy intersection, close to the Ealing Broadway “town centre”. From our living room window we can see lots of buses, and the numbers are easily recognisable (it helps that London buses have electronic number boards). And sometimes when Berry refuses to eat, her mother takes her to the window where they watch buses come and go, with one spoonful for each bus. Along the way, the wife reads out the bus numbers aloud to Berry. So far, though, Berry had never tried to read a bus number from our house window.

But sitting in a bus herself this evening, she “broke through”. Ahead of us was bus 427, which she read as “four seven”. I asked her what was in between 4 and 7, and she had no answer. Maybe she didn’t understand “between”.

A short distance later, there was bus 483 coming from the other side. She started with the 3 and then read the 8. And then the bus passed. And then there was bus E1 in front of us. Berry read it as “E”. I hadn’t known that she can recognise E. I know she knows all numbers, and A to D. So this was news to me. Getting her to read the number next to that was a challenge. 1 is a challenge for her since it looks like I. After much prompting, there was nothing, and I told her it was E1. Five minutes later, we encountered 427 again. This time she read in full, except that she called it “seven two four”.

I grew up at a time when our lives were much less documented. The only solid memory I have of my childhood is this photo album, most of whose photos were taken by an uncle who had a camera, and whose camera had this feature to imprint the date on the photos. So I have a very clear idea about what I looked like at different ages, and what I did when, but the rest of my growing up years were a little fuzzy.

There is the odd memory, though. My grandfather’s younger brother, who lived next door, had a car (a Fiat 1100). I loved going on rides with him in that, and I used to sit between him and my grandfather. I don’t remember too many specific trips, but I know that my grandfather would make me read signboards from shops, and I would read them letter by letter.

My grandfather’s younger brother passed away when I was two years and seven months old. So I know that by the time I was that age, I was able to read letters from signboards.

It is only natural for us to benchmark our children’s growth to that of other people we know – ourselves, if possible, and if not, some cousins or friends’ children. Thus far, I had lacked a marker to know of whether Berry had “beaten me to it” at various life events. I know she started walking quicker than me, because my first year birthday photos show me trying to stand on my won. I know she spoke later than me because multiple people have told me I would speak sentences at the time of our housewarming (when I was a year and half old).

Thanks to the memory of going on rides with my grandfather’s brother, and reading signboards, I know that I would read them before I was two years seven months old (or maybe earlier, since I’m guessing I did it multiple times in his car else no one would’ve told me about it).

And today, at two years and two months, the daughter started reading numbers on surrounding buses. She doesn’t know the full alphabet yet, but this is a strong start!

I’m proud of her!

Smashing the Law of Conservation of H

A decade and half ago, Ravikiran Rao came up with what he called the “law of conservation of H“. The concept has to do with the South Indian practice of adding a “H” to denote a soft consonant, a practice not shared by North Indians (Karthik instead of Kartik for example). This practice, Ravikiran claims, is balanced by the “South Indian” practice of using “S” instead of “Sh”, because of which the number of Hs in a name is conserved.

Ravikiran writes:

The Law of conservation of H states that the total number of H’s in the universe will be conserved. So the extra H’s that are added when Southies have to write names like Sunitha and Savitha are taken from the words Sasi and Sri Sri Ravisankar, thus maintaining a balance in the language.

Using data from the Bangalore first names data set (warning: very large file), it is clear that this theory doesn’t hold water, in Bangalore at least. For what the data shows is that not only do Bangaloreans love the “th” and “dh” for the soft T and D, they also use “sh” to mean “sh” rather than use “s” instead.

The most commonly cited examples of LoCoH are Swetha/Shweta and Sruthi/Shruti. In both cases, the former is the supposed “South Indian” spelling (with th for the soft T, and S instead of sh), while the latter is the “North Indian” spelling. As it turns out, in Bangalore, both these combinations are rather unpopular. Instead, it seems like if Bangaloreans can add a H to their name, they do. This table shows the number of people in Bangalore with different spellings for Shwetha and Shruthi (now I’m using the dominant Bangalorean spellings).

As you can see, Shwetha and Shruthi are miles ahead of any of the alternate ways in which the names can be spelt. And this heavy usage of H can be attributed to the way Kannada incorporates both Sanskrit and Dravidian history.

Kannada has a pretty large vocabulary of consonants. Every consonant has both the aspirated and unaspirated version, and voiced and unvoiced. There are three different S sounds (compared to Tamil which has none) and two Ls. And we need a way to transliterate each of them when writing in English. And while capitalising letters in the middle of a word (as per Harvard Kyoto convention) is not common practice, standard transliteration tries to differentiate as much as possible.

And so, since aspirated Tha and Dha aren’t that common in Kannada (except in the “Tha-Tha” symbols used by non-Kannadigas to show raised eyes), th and dh are used for the dental letters. And since Sh exists (and in two forms), there is no reason to substitute it with S (unlike Tamil). And so we have H everywhere.

Now, lest you were to think that I’m using just two names (Shwetha and Shruthi) to make my point, I dug through the names dataset to see how often names with interchangeable T and Th, and names with interchangeable S and Sh, appear in the Bangalore dataset. Here is a sample of both:

There are 13002 Karthiks registered to vote in Bangalore, but only 213 Kartiks. There are a hundred times as many Lathas as Latas. Shobha is far more common than Sobha, and Chandrashekhar much more common than Chandrasekhar.

 

So while other South Indians might conserve H, by not using them with S to compensate for using it with T and D, it doesn’t apply to Bangalore. Thinking about it, I wonder how a Kannadiga (Ravikiran) came up with this theory. Perhaps the fact that he has never lived in Karnataka explains it.

The Comeback of Lakshmi

A few months back I stumbled upon this dataset of all voters registered in Bangalore. A quick scraping script followed by a run later, I had the names and addresses and voter IDs of all voters registered to vote in Bangalore in the state assembly elections held this way.

As you can imagine, this is a fantastic dataset on which we can do the proverbial “gymnastics”. To start with, I’m using it to analyse names in the city, something like what Hariba did with Delhi names. I’ll start by looking at the most common names, and by age.

Now, extracting first names from a dataset of mostly south indian names, since South Indians are quite likely to use initials, and place them before their given names (for example, when in India, I most commonly write my name as “S Karthik”). I decided to treat all words of length 1 or 2 as initials (thus missing out on the “Om”s), and assume that the first word in the name of length 3 or greater is the given name (again ignoring those who put their family names first, or those that have expanded initials in the voter set).

The most common male first name in Bangalore, not surprisingly, is Mohammed, borne by 1.5% of all male registered voters in the city. This is followed by Syed, Venkatesh, Ramesh and Suresh. You might be surprised that Manjunath doesn’t make the list. This is a quirk of the way I’ve analysed the data – I’ve taken spellings as given and not tried to group names by alternate spellings.

And as it happens, Manjunatha is in sixth place, while Manjunath is in 8th, and if we were to consider the two as the same name, they would comfortably outnumber the Mohammeds! So the “Uber driver Manjunath(a)” stereotype is fairly well-founded.

Coming to the women, the most common name is Lakshmi, with about 1.55% of all women registered to vote having that name. Lakshmi is closely followed by Manjula (1.5%), with Geetha, Lakshmamma and Jayamma coming some way behind (all less than 1%) but taking the next three spots.

Where it gets interesting is if we were to look at the most common first name by age – see these tables.

 

 

 

 

 

 

Among men, it’s interesting to note that among the younger age group (18-39, with exception of 35) and older age group (57+), Muslim names are the most common, while the intermediate range of 40-56 seeing Hindu names such as Venkatesh and Ramesh dominating (if we assume Manjunath and Manjunatha are the same, the combined name comes top in the entire 26-42 age group).

I find the pattern of most common women’s names more interesting. It is interesting to note that the -amma suffix seems to have been done away with over the years (suffixes will be analysed in a separate post), with Lakshmamma turning into Lakshmi, for example.

It is also interesting to note that for a long period of time (women currently aged 30-43), Lakshmi went out of fashion, with Manjula taking over as the most common name! And then the trend reversed, as we see that the most common name among 24-29 year old women in Lakshmi again! And that seems to have gone out of fashion once again, with “modern names” such as Divya, Kavya and Pooja taking over! Check out these graphs to see the trends.

(I’ve assumed Manjunath and Manjunatha are the same for this graph)

So what explains Manjunath and Manjula being so incredibly popular in a certain age range, but quickly falling away on both sides? Maybe there was a lot of fog (manju) over Bangalore for a few years? 😛

Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it.  And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided  test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).

Advertising Agencies: From Brokers to Dealers

The Ken, where I bought a year long subscription today, has a brilliant piece on the ad agency business (paywalled) in India. More specifically, the piece is on pricing in the industry and how it is moving from a commissions only basis to a more mixed model.

Advertising agencies perform a dual role for their clients. Apart from advising them on advertising strategy and helping them create the campaigns, they are also in charge of execution and buying the advertising slots – either in print or television or hoardings (we’ll leave online out since the structure there is more complicated).

As far as the latter business (acquisition of slots to place the ad – commonly known as “buying”) is concerned, typically agencies have operated on a commission basis. The fees charged has been to the extent of about 2.5% of the value of the inventory bought.

In financial markets parlance, advertising agencies have traditionally operated as brokers, buying inventory on behalf of their clients and then charging a fee for it. The thrust of Ashish Mishra’s piece in ate Ken is that agencies are moving away from this model – and instead becoming what is known in financial markets as “dealers”.

Dealers, also known as market makers, make their money by taking the other side of the trade from the client. So if a client wants to buy IBM stock, the dealer is always available to sell it to her.

The dealer makes money by buying low and selling high – buying from people who want to sell and selling to people who want to buy. Their income is in the spread, and it is risky business, since they bear the risk of not being able to offload inventory they have had to buy. They hedge this risk by pricing – the harder they think it is to offload inventory, the wider they set the spreads.

Similarly, going by the Ken story, what ad agencies are nowadays doing is to buy inventory from media companies, and then selling it on to the clients, and making money on the spread. And clients aren’t taking too well to this new situation, subjecting the dealers ad agencies to audits.

From a market design perspective, there is nothing wrong in what the ad agencies are doing. The problem is due to their transition from brokers to dealers, and their clients not coming to terms with the fact that dealers don’t normally have a fiduciary responsibility towards their clients (unlike brokers who represent their clients). There are also local monopoly issues.

The main service that a dealer performs is to take the other side of the trade. The usual mechanism is that the dealer quotes the prices (both buy and sell) and then the client has the option to trade. If the client feels the dealer is ripping her off, she has a chance to not do the deal.

And in this kind of a situation, the price at which the dealer obtained the inventory is moot – all that matters to the deal is the price that the dealer is willing to sell to the client at, and the price that competing dealers might be charging.

So when clients of ad agencies demand that they get the inventory at the same price at which the agencies got it from the media, they are effectively asking for “retail goods at wholesale rates” and refusing to respect the risk that the dealers might have taken in acquiring the inventories (remember the ad agencies run the risk of inventories going unsold if they price them too high).

The reason for the little turmoil in the ad agency industry is that it is an industry in transition – where the agencies are moving from being brokers to being dealers, and clients are in the process of coming to terms with it.

And from one quote in the article (paywalled, again), it seems like the industry might as well move completely to a dealer model from the current broker model.

Clients who are aware are now questioning the point of paying a commission to an agency. “The client’s rationale is that is that it is my money that is being spent. And on that you are already making money as rebate, discount, incentive and reselling inventory to me at a margin, so why do I need to pay you any agency commissions? Some clients have lost trust in their agencies owing to lack of transparency,” says Sodhani.

Finally, there is the issue of monopoly. Dealers work best when there is competition – the clients need to have an option to walk away from the dealers’ exorbitant prices. And this is a bit problematic in the advertising world since agencies act as their clients’ brokers elsewhere in the chain – planning, creating ads, etc.

However the financial industry has dealt with this problem where most large banks function as both brokers and dealers. It’s only a matter of time before the advertising world goes down that path as well.

PS: you can read more about brokers and dealers and marketplaces and platforms in my book Between the Buyer and the Seller