Good vodka and bad chicken

When I studied Artificial Intelligence, back in 2002, neural networks weren’t a thing. The limited compute capacity and storage available at that point in time meant that most artificial intelligence consisted of what is called “rule based methods”.

And as part of the course we learnt about machine translation, and the difficulty of getting the implicit meaning across. The favourite example by computer scientists in that time was the story of how some scientists translated “the spirit is willing but the flesh is weak” into Russian using an English-Russian translation software, and then converted it back into English using a Russian-English translation software.

The result was “the vodka is excellent but the chicken is not good”.

While this joke may not be valid any more thanks to the advances in machine translation, aided by big data and neural networks, the issue of translation is useful in other contexts.

Firstly, speaking in a language that is not your “technical first language” makes you eschew jargon. If you have been struggling to get rid of jargon from your professional vocabulary, one way to get around it is to speak more in your native language (which, if you’re Indian, is unlikely to be your technical first language). Devoid of the idioms and acronyms that you normally fill your official conversation with, you are forced to think, and this practice of talking technical stuff in a non-usual language will help you cut your jargon.

There is another use case for using non-standard languages – dealing with extremely verbose prose. A number of commentators, a large number of whom are rather well-reputed, have this habit of filling their columns with flowery language, GRE words, repetition and rhetoric. While there is usually some useful content in these columns, it gets lost in the language and idioms and other things that would make the columnist’s high school English teacher happy.

I suggest that these columns be given the spirit-flesh treatment. Translate them into a non-English language, get rid of redundancies in sentences and then  translate them back into English. This process, if the translators are good at producing simple language, will remove the bluster and make the column much more readable.

Speaking in a non-standard language can also make you get out of your comfort zone and think harder. Earlier this week, I spent two hours recording a podcast in Hindi on cricket analytics. My Hindi is so bad that I usually think in Kannada or English and then translate the sentence “live” in my head. And as you can hear, I sometimes struggle for words. Anyway here is the thing. Listen to this if you can bear to hear my Hindi for over an hour.

The Comeback of Lakshmi

A few months back I stumbled upon this dataset of all voters registered in Bangalore. A quick scraping script followed by a run later, I had the names and addresses and voter IDs of all voters registered to vote in Bangalore in the state assembly elections held this way.

As you can imagine, this is a fantastic dataset on which we can do the proverbial “gymnastics”. To start with, I’m using it to analyse names in the city, something like what Hariba did with Delhi names. I’ll start by looking at the most common names, and by age.

Now, extracting first names from a dataset of mostly south indian names, since South Indians are quite likely to use initials, and place them before their given names (for example, when in India, I most commonly write my name as “S Karthik”). I decided to treat all words of length 1 or 2 as initials (thus missing out on the “Om”s), and assume that the first word in the name of length 3 or greater is the given name (again ignoring those who put their family names first, or those that have expanded initials in the voter set).

The most common male first name in Bangalore, not surprisingly, is Mohammed, borne by 1.5% of all male registered voters in the city. This is followed by Syed, Venkatesh, Ramesh and Suresh. You might be surprised that Manjunath doesn’t make the list. This is a quirk of the way I’ve analysed the data – I’ve taken spellings as given and not tried to group names by alternate spellings.

And as it happens, Manjunatha is in sixth place, while Manjunath is in 8th, and if we were to consider the two as the same name, they would comfortably outnumber the Mohammeds! So the “Uber driver Manjunath(a)” stereotype is fairly well-founded.

Coming to the women, the most common name is Lakshmi, with about 1.55% of all women registered to vote having that name. Lakshmi is closely followed by Manjula (1.5%), with Geetha, Lakshmamma and Jayamma coming some way behind (all less than 1%) but taking the next three spots.

Where it gets interesting is if we were to look at the most common first name by age – see these tables.

 

 

 

 

 

 

Among men, it’s interesting to note that among the younger age group (18-39, with exception of 35) and older age group (57+), Muslim names are the most common, while the intermediate range of 40-56 seeing Hindu names such as Venkatesh and Ramesh dominating (if we assume Manjunath and Manjunatha are the same, the combined name comes top in the entire 26-42 age group).

I find the pattern of most common women’s names more interesting. It is interesting to note that the -amma suffix seems to have been done away with over the years (suffixes will be analysed in a separate post), with Lakshmamma turning into Lakshmi, for example.

It is also interesting to note that for a long period of time (women currently aged 30-43), Lakshmi went out of fashion, with Manjula taking over as the most common name! And then the trend reversed, as we see that the most common name among 24-29 year old women in Lakshmi again! And that seems to have gone out of fashion once again, with “modern names” such as Divya, Kavya and Pooja taking over! Check out these graphs to see the trends.

(I’ve assumed Manjunath and Manjunatha are the same for this graph)

So what explains Manjunath and Manjula being so incredibly popular in a certain age range, but quickly falling away on both sides? Maybe there was a lot of fog (manju) over Bangalore for a few years? 😛

The one bit machine

My daughter is two weeks old today and she continues to be a “one bit machine”. The extent of her outward communication is restricted to a maximum of one bit of information. There are basically two states her outward communication can fall under – “cry” and “not cry”, and given that the two are not equally probable, the amount of information she gives out is strictly less than one bit.

I had planned to write this post two weeks back, the day she was born, and wanted to speculate how long it would take for her to expand her repertoire of communication and provide us with more information on what she wants. Two weeks in, I hereby report that the complexity of communication hasn’t improved.

Soon (I don’t know how soon) I expect her to start providing us more information – maybe there will be one kind of cry when she’s hungry, and another when she wants her diaper changed. Maybe she’ll start displaying other methods of outward communication – using her facial muscles, for example (right now, while she contorts her face in a zillion ways, there is absolutely no information conveyed), and we can figure out with greater certainty what she wants to convey.

I’m thinking about drawing a graph with age of the person on the X axis, and the complexity of outward information on the Y axis. It starts off with X = 0 and Y = 1 (I haven’t bothered measuring the frequency of cry/no-cry responses so let’s assume it’s equiprobable and she conveys one bit). It goes on to X = 14 days and Y = 1 (today’s state). And then increases with time (I’m hoping).

While I’m sure research exists some place on the information content per syllable in adult communication, I hope to draw this graph sometime based on personal observation of my specimen (though that would limit it to one data point).

Right now, though, I speculate what kind of shape this graph might take. Considering it has so far failed to take off at all, I hope that it’ll be either an exponential (short-term good but long-term I don’t know ) or a sigmoid (more likely I’d think).

Let’s wait and see.

Languages as memes

A while back on this blog I had compared religious and cultural practices to memes (in the original Richard Dawkins sense of the word). Back then I had written:

So if you were to look at it in terms of responsibility to society, you need to propagate only those cultural traits that you deem to be relevant and important. “So what if everyone stops celebrating Ganesh Chaturthi?” you may ask. If that would happen that would simply mean a vote of no confidence for the festival and an indication that the festival needs to be phased out. If everyone were to propagate only those cultural traits they find useful, traits that a significant proportion of society finds significant will continue to survive and thrive. For Ganesh Chaturthi to exist 30 years hence, it isn’t necessary for ALL families that have inherited it to celebrate it now. As long as a critical mass of families celebrate it, the festival will survive. If not, it probably doesn’t need to exist.

 

Now, thinking about it, you can consider language to also be a meme. When a bunch of you find that there is a concept for which the language you speak in has no word, you invent a word and add it to the language (this is like a genetic mutation). If enough people like this mutation (i.e. if it is “fit”) it will propagate, and soon become part of the language.

If there is a word in the language that is archaic and not useful for describing any of the phenomena that you are likely to encounter, you stop using it. When people stop using such words, they become “archaic” (ok I see circular reasoning in this paragraph) and effectively drop out of the language. Thus, a living language is always dynamic, receptive to new words (to describe concepts that earlier didn’t need description) and receptive to discarding words that are not useful any more. Thus, the feature that defines a living language is dynamism and change.

This has several policy implications.

1. The concept of “purity” of language is wrong. Some people want to speak in the “pure form” of a language. As long as it is a language that has been truly alive (and not kept alive mostly by ancient literature) there exists no “pure form”, for the definition of a successful language involves frequent “mutations”. So if you ask me to talk in “pure Kannada” it is nonsense. Pure Sanskrit, on the other hand, has some meaning, for the language has been so little used that it’s stopped evolving and mutating.

2. People like to appoint themselves guardians of culture and dictate top-down what words should be part of a particular language. For example, there exists a body under the Government of Karnataka (if I’m not wrong) which dictates what “Kannada words” must be used for different new concepts. This is wrong, and a recipe for such words not being used.

Instead, “memetics” must be respected and evolution must be bottom up. People find the need to describe phenomena around themselves and if they don’t find a word in their language that describes it, they will either invent or borrow one such word. Some such new words become widely used, at which point of time they can be introduced into the language dictionary. Usage should precede presence in the dictionary, not the other way round.

3. “Slang” is a part of language, and a leading indicator of how the language is going to evolve. It should be encouraged and not denounced. For it exists because the language as it stands now cannot effectively enough describe certain concepts.

I’m currently reading this book called The Information by James Gleick, which has a chapter or two dedicated to languages and dictionaries. It was while reading it that I realised how languages are memes.

 

Switching languages

I used to marvel about how whenever I was in the company of other people from IIT Madras, I would instinctively switch to speaking “IITese“. Words such as “slisha”, “peace”, “rod”, and all others that I would not normally use in normal English when speaking to normal people would suddenly appear in my vocabulary while talking to others from IITM.

I used to consider myself special that I could discriminate thus, and make best use of the languages I know while not discriminating against people who didn’t understand one of the languages, such as IITese. I used to consider this great, but this bubble got broken when my nephew started talking.

This guy is half-Kannadiga, half-Marathi, with a Gult nanny and his parents speak to each other in Hindi. He is now three years old and for over a year now he’s been very comfortable speaking Kannada and Marathi, and to an extent Telugu, Hindi and English (which he’s learning in school) !  The most remarkable thing with him, though, (as with all other multilingual kids, I would imagine) is that he has mapped people to languages. For example, he knows that I speak Kannada and he speaks to me only in Kannada. And while talking to me if his father (who is Marathi) is present, he immediately switches to Marathi to talk to him. Across languages that are very different, he is able to switch easily and seamlessly and moreover know who speaks which language!

There is a downside, though. Once when his mother, who is “supposed to speak to him in Kannada”, tried talking to him in Marathi, he got really angry and wild and asked her to speak in Kannada! Our initial thought was he was being finicky, but I now think it is to do with parsing. When his mother speaks, he has his “Kannada parser” switched on, and if she speaks Marathi, there is a parsing error and it causes great stress on his processor to switch languages. And being a small kid, that makes him cranky and wild!

In other words, this can be considered as another case of Bayesian recognition! It seems like the human mind’s parsing of speech is influenced by the prior distribution of what language the speaker is speaking in. As the first few words come out, we firm up which parser to use, and then it is smooth sailing. For a kid, though, it seems like the prior distribution of parsers is “binary” (one 1, and the rest 0s), which is what makes the wrong speaker wrong language combo annoying for them!

Us human beings are smarter than we think!

Language

For millions of years
Mankind lived
Just like the animals

And then something happened
That unleashed the power of our imagination
We learned to talk

(from Pink Floyd’s Keep Talking from Division Bell)

And then we moved to a place where no one speaks any of the languages you speak. And we became animals again.

This trip to Barcelona is the first time I’ve spent a reasonable length of time I’ve spent in a place where no one speaks any of the languages that I speak. And I’ve been literally feeling like an animal again, absolutely incapable of communicating, pointing at things and using sign language. It seems like my experience here has been significantly diminished given my inability to speak any of the languages spoken here.

I learnt to talk Kannada when I was perhaps one, or max two. I learnt English in a year or two after that. And then my language learning stopped. I had Hindi as my second language in school, and somehow struggled through it despite scoring 90 out of 100 in my board exam (shows how pointless board exams are). I can understand Hindi, and watch Hindi movies, but I still can’t speak fluently. When I have to speak Hindi, I construct a sentence in Kannada and then translate it. And I speak it with a heavy Kannada accent, much to the mirth of people around.

I have a Bihari cook in Bangalore. He claims to know Kannada  but I’ve never tried testing that. And I try speaking to him in Hindi. It is almost like we use sign language. I point to a set of ingredients and tell him the name of what I want to eat. He cooks, and buzzes off. At least talking face to face is fine. There are occasions when I have to call him and give him instructions (“come early tomorrow” or “come late today” or “don’t come today” or some such). It is a nightmare.

It’s not like I’m absolutely bad at languages – I can pick up words  quite easily. Thanks to football watching I’ve learnt a fair bit of European history and geography and culture, and through the process I’ve learnt a fair number of words (they’re of the kind of trequartistaregistatornante, etc but European words nevertheless). I know words in several languages. Just that I have this inability to learn grammar, or how words are put together to form sentences and communicate thoughts (except of course in English and Kannada).

Fourteen years back I went to IIT Madras, and half the people in my class were Gult. That meant I had the opportunity to pick up a fair bit of both Telugu and Tamil. I did neither. I can understand both languages a fair bit, but my understanding of the languages can be described as “assembly language”. I know words and what they mean. I listen for such keywords in what people are saying and interpret based on that. And when I speak these languages, it is based on keywords – I just say out the noun and the root form of the verb and expect the other person to interpret. I’ve never managed to get beyond this!

So there are these bakeries near where I live which might have already marked me off as a weird animal who just walks in and out o them. I go in, survey what they have and if something looks interesting point to that. They pack it for me, and then tell a number. I ask for the bill – so that I can read the number, or just give them a large enough note and trust them to return me the exact change. When nothing looks interesting to me in the display I can’t talk and ask them for what I want. I just look around (perhaps like a bakery dog) and just walk away. I don’t know how to say “Sorry I don’t know what I want”, or “Thank you, but I don’t find anything interesting here”. And I’ve been visiting some of these places multiple times, doing the same thing!

The level of discourse we are reduced to when we are unable to communicate is rather remarkable! It’s like we can simply not unleash the power of imagination, it is like going back to living like animals. I don’t like it, but I don’t know how to remedy it – I simply can’t pick up new languages!

An economic view of state splits

Most commentators prefer to couch the Andhra Pradesh split in emotive terms – the people of Telangana thought they were being treated in an inferior manner by the people of Andhra, and hence wanted to break away to form a separate state, and that the people of the Rest-of-Andhra (RoA) did not want the split because of reasons of Telugu pride. This is wrong, and over-complicates the issue.

Insight: When something can be explained with simple economic reasoning, looking for other (emotional/psychological/social/…) reasons is futile.

Telangana is the region of Andhra Pradesh that had been part of the Hyderabad state (RoA was part of the Madras Presidency). Due to differing standards of governance in the “Provinces” and the “Princely States”, at the time of independence, Telangana was backward compared to RoA. It didn’t help matters that Telangana was at the receiving end of brutality by the Nizam’s Razakars during the year or so when Hyderabad state was not yet part of India.

Given the vastly differing levels of development in Telangana and RoA, and the differing cultural backgrounds, I’m not sure it made economic sense to unite them in the 1950s. Potti Sriramulu’s fast and subsequent death, however, turned the issue emotive, and there was no room for rational reasoning. And a united Andhra Pradesh was created in 1956. In any case, it was consistent with the mantra of the day to have linguistic states – administrative unwieldiness be damned.

Unless there is a concerted effort, in the natural order of things, when you have a rich part  and a poor part of a particular state or country, the rich part can be expected to grow faster than the poor part – no malice here, it is simple network effects. Andhra Pradesh was no exception to this rule, and soon Telangana was much more backward than RoA.

The state splits in 2000 when Uttarakhand, Jharkhand and Chhattisgarh were formed shows that richer parts of states usually don’t mind letting go of the poorer parts. This is especially true if there is a feeling that taxes collected predominantly from the richer parts are being used to disproportionately fund the poorer parts. There might be some emotional attachment, but economics usually rules. Then, why is it that there is so much opposition to hiving off the poorer parts of Andhra Pradesh?

Andhra Pradesh has this unique situation where the capital city (and by far the biggest city) Hyderabad is located in the poorer portion. Hyderabad being the capital city saw significant investments from people from all over Andhra Pradesh, including RoA. A significant portion of RoA investment in Hyderabad is in real estate. Now, with real estate regulation being a state subject, people normally don’t want to hold too much real estate investment outside of their home states – since they will have no control over the politics of those states. So RoA investors are freaking out that their long-term investments in Hyderabad will soon be in a different state.

In a situation such as this, prudent investors might want to pull out (rather than risk their capital in a neighbouring, and possibly hostile, state). However, the problem is that none of the investors want to set off a downward price spiral in Hyderabad. Think of the situation as one where investors from RoA are invested, but know other investors are wanting to pull out any time. But you don’t want to start the process of pulling out, since that can reduce the value of your other holdings. So you stay invested. And hope that the bubble will never burst.

The attempt by people of RoA to hold on to Telangana (Hyderabad, specifically, they don’t really care about the rest of Telangana), is an attempt to save their capital locked up in investments in Hyderabad. If Andhra Pradesh doesn’t split, nobody from RoA will want to pull out their capital from Hyderabad, and the drop in value won’t happen.

It must be pointed out here that the Congress Party’s solution of having Hyderabad as a joint capital for a number of years is unlikely to be much compensation, if Telangana holds jurisdiction over Hyderabad territory anyway. The capital of RoA investors in Hyderabad will still be under risk in that case. The other option proposed was to make Hyderabad a Union Territory,  but that wouldn’t help either – since the influence of RoA in the politics of Hyderabad would still be minuscule.

To summarize, the reason RoA doesn’t want to let go of Telangana is because they don’t want to lose political control over their own investments – in the city of Hyderabad. Issues such as “state pride” and “Telugu pride” are secondary – they have been drummed up just to get the support of the non-elite who may not be economically affected by the state division. In fact, if Telugu pride were so important why in the first place would the Gults of Telangana want a separate state?

From a policy standpoint, it is important to not let the discourse of language pride get into the way of forming smaller states. The only reasons that should matter should be economics and administrative efficiency.