Python and Hindi

So I’ve recently discovered that using Python to analyse data is, to me, like talking in Hindi. Let me explain.

Back in 2008-9 I lived in Delhi, where the only language spoken was Hindi. Now, while I’ve learnt Hindi formally in school (I got 90 out of 100 in my 10th boards!), and watched plenty of Hindi movies, I’ve never been particularly fluent in the language.

The basic problem is that I don’t know the language well enough to think in it. So when I’m talking Hindi, I usually think in Kannada and then translate my thoughts. This means my speech is slow – even Atal Behari Vajpayee can speak Hindi faster than me.

More importantly, thinking in Kannada and translating means that I can get several idioms wrong (can’t think of particular examples now). And I end up using the language in ways that native speakers don’t (again can’t think of examples here).

I recently realised it’s the same with programming languages. For some 7 years now I’ve mostly used R for data analysis, and have grown super comfortable with it. However, at work nowadays I’m required to use Python for my analysis, to ensure consistency with the rest of the firm.

While I’ve grown reasonably comfortable with using Python over the last few months, I realise that I have the same Hindi problem. I simply can’t think in Python. Any analysis I need to do, I think about it in R terms, and then mentally translate the code before performing it in Python.

This results in several inefficiencies. Firstly, the two languages are constructed differently and optimised for different things. When I think in one language and mentally translate the code to the other, I’m exploiting the efficiencies of the thinking language rather than the efficiencies of the coding language.

Then, the translation process itself can be ugly. What might be one line of code in R can sometimes take 15 lines in Python (and vice versa). So I end up writing insanely verbose code that is hard to read.

Such code also looks ugly – a “native user” of the language finds it rather funnily written, and will find it hard to read.

A decade ago, after a year of struggling in Delhi, I packed my bags and moved back to Bangalore, where I could both think and speak in Kannada. Wonder what this implies in a programming context!

R, Windows, Mac, and Bangalore and Chennai Auto Rickshaws

R on Windows is like a Bangalore auto rickshaw, R on Mac is a Chennai auto rickshaw. Let me explain.

For a long time now I’ve been using R for all my data management and manipulation and analysis and what not. Till two months back I did so on a Windows laptop and a desktop. The laptop had 8 GB RAM and the desktop had 16GB RAM. I would handle large datasets, and sometimes when I would try to do something complicated that required the use of more memory space than the computer had, the process would fail, saying “fail to allocate X GB of memory”. On Windows R would not creep into the hard disk, into virtual memory territory.

In other words it was like a Bangalore auto rickshaw, which plies mostly on meter but refuses to come to areas that are outside the driver’s “zone”. A binary decision. A yes or a no. No concept of price discrimination.

The Mac, which I’ve been using for the last two months, behaves differently. This one has only 8GB of RAM, but I’m able to handle large datasets without ever running out of memory. How is this achieved? By means of using the system’s Virtual Memory. This means the system doesn’t run out of memory, I haven’t received the “can’t allocate memory” error even once on this Mac.

So the catch here is that the virtual memory (despite having a SSD hard disk) is painfully slow, and it takes a much longer time for the program to read and write from the memory than it does with the main memory. This means that processes that need more than 8 GB of RAM (I frequently end up running such queries) execute, but take a really long time to do so.

This is like Chennai auto rickshaws, who never say “no” but make sure they charge a price that will well compensate them for the distance and time and trouble and effort, and a bit more.