computer science – Pertinent Observations

Channel Coding Theorem in Real Life

One of my favourite concepts in Computer Science is Shannon’s Channel Coding Theorem. This theorem is basically about the efficiency of communication over a noisy channel. And as I was thinking a few minutes back, this has interesting implications in real life as well, well away from the theory of communication.

I don’t have that much understanding of the rigorous explanation of the theorem. However, I absolutely love the central idea of it – that the noisier a channel is, the more the redundancy you need in your communication, and thus the slower is your communication. A corollary of this is that every channel has a “natural maximum speed”, and as long as you try to communicate within that speed, you can communicate reliably.

I won’t go into the technical details here – that involves assuming that the channel loses (or garbles) X% of bits, and then constructing a redundant code that shows that even with this loss, you can communicate effectively.

Anyway, let’s leave behind the theory communication and go on to real life.

I’ve found that I communicate badly when I’m not sure what language to talk in. If I’m talking in English with someone who I know knows good English, I communicate rather well (like my writing ) . However, if I’m not sure about the quality of language of the other person, I hesitate. I try to force myself to find simpler / more obvious words, and that disturbs my flow of thought, and I stammer.

Similarly, when I’m not sure whether to talk in Kannada or English (the two languages I’m very comfortable in), I stammer heavily. Again, because I’m not sure if the words I would naturally use will be understood by the other person (the counterparty’s comprehension being the “noise in the channel” here), I slow down, get jittery, and speak badly.

Then of course, there is the very literal interpretation of the channel coding theorem – when your internet connection (or call quality in general) is bad, you end up having to speak slower. When I was hunting for a job in 2020, I remember doing badly in a few interviews because of the quality (or lack thereof) of the internet connection (this was before I had discovered that Google Meet performs badly on Safari).

Similarly, sometime last month, I had thought I had prepared well for what I thought was going to be a key conversation at work. The internet was bad, we couldn’t hear each other and kept repeating (redundancy is how you overcome the noise in the channel), and that diminished throughput massively. Given the added difficulty in communication, I didn’t bring up the key points I had prepared for. It was a damp squib.

Related to this is when you aren’t sure if the person you are speaking to can hear clearly. This disability again clouds the communication channel, meaning you need to build in redundancy, and thus a reduction in throughput.

When you are uncertain of yourself, or underconfident, you end up tending to do badly. That is because when you are uncertain, you aren’t sure if the other person will fully understand what you are going to say. Consequently, you end up talking slower, building redundancy in your speech, etc. You are more doubtful of what you are going to say, and don’t take risks, since your lack of confidence has clouded the “communication channel”, thus depressing your throughput.

Again a lot of this might apply to me alone – I function best when I’m talking / writing at a certain minimum throughput, and operating at anywhere below that makes me jittery and underconfident and a bad communicator. It is no surprise that my writing really took off once I got a computer of my own.

That was in the beginning of July 2004, and within a month, I had started (the predecessor of) this blog. I’ve been blogging for 19 years now.

That aside aside, the channel coding theorem works in non-verbal contexts as well. Back in 2016, before my daughter was born, I remember reading somewhere that tentative mothers lead to cranky babies. The theory was that if the mum was anxious or afraid while handling her baby, the baby wouldn’t perceive the signals of touch sufficiently, and being devoid of communication, become cranky.

We had seen a few examples of this among relatives and friends (and this possibly applies to me as well – my mother had told me that I was the first newborn she ever handled, and so she was a bit tentative in handling me). This again can be explained using the Channel Coding Theorem.

When the mother’s touch is tentative, it is as if the touchy channel between mother and child has some “noise”. The tentativeness of the touch means the baby is not really sure of what the mother is “saying”. With touch, unlike language or bits, redundancy is harder. And so the child goes up insufficiently connected to its mother.

Conversely, later on in life, these tentative mothers tend to bring in redundancy in their communications with their (now jittery) children, and end up holding them too hard, and not letting them go (and some of these children go to therapists, who inevitably blame it on the mothers ). Ultimately, all of this stems from the noise in the initial communication channel (thanks to the tentativeness of the source).

Ok I’ve rambled on here, so will stop now. However, now that I’ve seeded this thought in you, you too will start seeing the channel coding theorem everywhere (oh – if you think this post is badly written, then that is again like reading this over a noisy channel. And you will get irritated with the lack of throughput and pack).

Code Density

As many of the regular readers of this blog know, I largely use R for most of my data work. There have been a few occasions when I’ve tried to use Python, but have found that I’m far less efficient in that than I am with R, and so abandoned it, despite the relative ease of putting things into production.

Now in my company, like in most companies, people use both Python and R (the team that reports to me largely uses R, everyone else largely uses Python). And while till recently I used to claim that I’m multilingual in the sense that I can read Python code fairly competently, of late I’m not sure I am. I find it increasingly difficult to parse and read production grade python code.

And now, after some experiments with ChatGPT, and exploring other people’s codes, I have an idea on why I’m finding it hard to read production-grade Python code. It has to do with “code density”.

Of late I’ve been experimenting with Spark (finally, in this job I do a lot of “big data” work – something I never had to in my consulting career prior to this). Related to this, I was reading someone’s PySpark code.

And then it hit me – the problem (rather, my problem) with Python is that it is far more verbose than R. The number of characters or lines of code required to do something in Python is far more than what you need in R (especially if you are using the tidyverse family of packages, which I do, including sparklyr for spark).

Why does the density of code matter? It is to do with aesthetics and modularity and ease of understanding.

Yesterday I was writing some code that I plan to put into production. After a few hours of coding, I looked at the code and felt disgusted with myself – it was a way too long monolithic block of code. It might have been good when I was writing it, but I knew that if I were to revisit it in a week or two, I wouldn’t be able to understand what the hell was happening there.

I’ve never worked as a professional software engineer, but with the amount of coding I’ve done, I’ve worked out what is a “reasonable length for a code block”. It’s like that apocryphal story of Indian public examiners for high school exams who evaluate history answers based on how long they are – “if they were to place an ordinary Reynolds 045 pen vertically on the sheet, the answer should be longer than that for the student to get five marks”.

An answer in a high school history exam needs to be longer than this. A code block or function should be shorter than this

It’s the reverse here. Approximately speaking, if you were to place a Reynolds pen vertically on screen (at your normal font size), no function (or code block) can be longer than the pen.

This easily approximates how much the eye can see on one normal Macbook screen (I use a massive external monitor at work, and a less massive, but equally wide, one at home). If you have to keep scrolling up and down to understand the full logic, there is a higher chance you might make mistakes, and higher difficulty for someone to understand the code.

Till recently (as in earlier this week) I would crib like crazy that people coding in Python would make their code “too modular”. That I would have to keep switching back and forth between different functions (and classes!!) to make sense of some logic, and about how that would make codes hard to debug (I still think there is a limit to how modular you can make your ETL code).

Now, however (I’m writing this on a Saturday – I’m not working today), from the code density perspective, it all makes sense to me.

The advantage of R is that because the code is far denser, you can pack in a far greater amount of logic in a Reynolds pen length of code. So over the years I’ve gotten used to having this much logic being presented to me in one chunk (without having to scroll or switch functions).

The relatively lower density of Python, however, means that the same amount of logic that would be one function in R is now split across several different functions. It is not that the writer of the code is “making this too modular” or “writing functions just for the heck of it”. It is just that their “mental Reynolds pens” doesn’t allow them to pack in more lines in a chunk or function, and Python’s density means there is only so much logic that can go in there.

As part of my undergrad, I did a course on Software Engineering (and the one thing the professor told us then was that we should NOT take up software engineering as a career – “it’s a boring job”, he had said). In that, one of the things we learnt was that in conventional software services contexts, billing would happen as a (nonlinear) function of “kilo lines of code” (this was in early 2003).

Now, looking back, one thing I can say is that the rate per kilo line of R code ought to be much higher than the rate per kilo line of Python code.

Cross posted on my now-largely-dormant Art of Data Science newsletter

The Second Great Wall (of programming)

Back in 2000, I entered the Computer Science undergrad program at IIT Madras thinking I was a fairly competent coder. In my high school, I had a pretty good reputation in terms of my programming skills and had built a whole bunch of games.

By the time half the course was done I had completely fallen out of love with programming, deciding a career in Computer Science was not for me. I even ignored Kama (current diro)’s advice and went on to do an MBA.

He had given me a 3 hour lecture about how I'm wasting my life when I happened to tell him that I was preparing for CAT. https://t.co/1rWG0BnymW

— Karthik S (@karthiks) April 19, 2023

What had happened? Basically it was a sudden increase in the steepness of the learning curve. Or that I’m a massive sucker for user experience, which the Computer Science program didn’t care for.

Back in school, my IDE of choice (rather the only one available) was TurboC, a DOS-based program. You would write your code, and then hit Ctrl+F9 to run the program. And it would just run. I didn’t have to deal with any technical issues. Looking back, we had built some fairly complex programs just using TurboC.

And then I went to IIT and found that there was no TurboC, no DOS. Most computers there had an ancient version of Unix (or worse, Solaris). These didn’t come with nice IDEs such as TurboC. Instead, you had to use vi (some of the computers were so old they didn’t even have vim) to write the code, and then compile it from outside.

Difficulties in coming to terms with vi meant that my speed of typing dropped. I couldn’t “code at the speed of thought” any more. This was the first give up moment.

Then, I discovered that C++ had now got this new set of “standard template libraries” (STL) with vectors and stuff. This was very alien to the way I had learnt C++ in school. Also I found that some of my classmates were very proficient with this, and I just couldn’t keep up with this. The effort seemed too much (and the general workload of the program was so high that I couldn’t get much time for “learning by myself”), so I gave up once again.

Next, I figured that a lot of my professors were suckers for graphic UIs (though they denied us good UX by denying us good computers). This, circa 2001-2, meant programming in Java and writing applets. It was a massive degree of complexity (and “boringness”) compared to the crisp C/C++ code I was used to writing. I gave up yet again.

I wasn’t done with giving up yet. Beyond all of this, there was “systems programming”. You had to write some network layers and stuff. You had to go deep into the workings of the computer system to get your code to run. This came rather intuitively to most of my engineering-minded classmates. It didn’t to me (programming in C was the “deepest” I could grok). And I gave up even more.

A decade after I graduated from IIT Madras, I visited IIM Calcutta to deliver a lecture. And got educated.

I did my B.Tech. project in “theoretical computer science”, managed to graduate and went on to do an MBA. Just before my MBA, I was helping my father with some work, and he figured I sucked at Excel. “What is the use of completing a B.Tech. in computer science if you can’t even do simple Excel stuff?”, he thundered.

In IIMB, all of us bought computers with pirated Windows and Office. I started using Excel. It was an absolute joy. It was a decade before I started using Apple products, but the UX of Windows was such a massive upgrade compared to what I’d seen in my more technical life.

In my first job (where I didn’t last long) I learnt the absolute joy of Visual Basic macros for Excel. This was another level of unlock. I did some insane gymnastics in that. I pissed off a lot of people in my second job by replicating what they thought was a complex model on an Excel sheet. In my third job, I replaced a guy on my team with an Excel macro. My programming mojo was back.

Goldman Sachs’s SLANG was even better. By the time I left from there, I had learnt R as well. And then I became a “data scientist”. People asked me to use Python. I struggled with it. After the user experience of R, this was too complex. This brought back bad memories of all the systems programming and dealing with bad UX I had encountered in my undergrad. This time I was in control (I was a freelancer) so I didn’t need to give up – I was able to get all my work done in R.

The second giving up

I’ve happily used R for most of my data work in the last decade. Recently at work I started using Databricks (still write my code in R there, using sparklyr), and I’m quite liking that as well. However, in the last 3-4 months there has been a lot of developments in “AI”, which I’ve wanted to explore.

The unfortunate thing is that most of this is available only in Python. And the bad UX problem is back again.

Initially I got excited, and managed to install Stable Diffusion on my personal Mac. I started writing some OpenAI code as well (largely using R). I started tracking developments in artificial intelligence, and trying some of them out.

And now, in the last 2-3 weeks, I’ve been struggling with “virtual environments”. Each newfangled open-source AI that is released comes with its own codebase and package requirements. They are all mutually incompatible. You install one package, and you break another package.

The “solution” to this, from what I could gather, is to use virtual environments – basically a sandbox for each of these things that I’ve downloaded. That, I find, is grossly inadequate. One of the points of using open source software is to experiment with connecting up two or more of them. And if each needs to be in its own sandbox, how is one supposed to do this? And how are all other data scientists and software engineers okay with this?

This whole virtual environment mess means that I’m giving up on programming once again. I won’t fully give up – I’ll continue to use R for most of my data work (including my job), but I’m on the verge of giving up in terms of these “complex AI”.

It’s the UX thing all over again. I simply can’t handle bad UX. Maybe it’s my ADHD. But when something is hard to use, I simply don’t want to use it.

And so I’m giving up once again. Tagore was right.

Computer science and psychology

This morning, when I got back from the gym, my wife and daughter were playing 20 questions, with my wife having just taught my daughter the game.

Given that this was the first time they were playing, they started with guessing “2 digit numbers”. And when I came in, they were asking questions such as “is this number divisible by 6” etc.

To me this was obviously inefficient. “Binary search is $O(log n)$ “, I realised in my head, and decided this is a good time to teach my daughter binary search.

So for the next game, I volunteered to guess, and started with “is the number $\ge 55$ “? And went on to “is the number $\ge 77$ “, and got to the number in my wife’s mind (74) in exactly 7 guesses (and you might guess that $\lceil log_2 90 \rceil$ (90 is the number of 2 digit numbers) is 7).

And so we moved on. Next, I “kept” 41, and my wife went through a rather random series of guesses (including “is it divisible by 4” fairly early on) to get in 8 tries. By this time I had been feeling massively proud, of putting to good use my computer science knowledge in real life.

“See, you keep saying that I’m not a good engineer. See how I’m using skills that I learnt in my engineering to do well in this game”, I exclaimed. My wife didn’t react.

It was finally my daughter’s turn to keep a number in mind, and my turn to guess.

“Is the number $\ge 55$ ?”
“Yes”

“Is the number $\ge 77$ ?”
“Yes”

“Is the number $\ge 88$ ?”
“Yes”

My wife started grinning. I ignored it and continued with my “process”, and I got to the right answer (99) in 6 tries. “You are stupid and know nothing”, said my wife. “As soon as she said it’s greater than 88, I knew it is 99. You might be good at computer science but I’m good at psychology”.

She had a point. And then I started thinking – basically the binary search method works under the assumption that the numbers are all uniformly distributed. Clearly, my wife had some superior information to me, which made 99 far more probable than any number between 89 and 98. And s0 when the answer to “Is the number $\ge 88$ ?”turned out to by “yes”, she made an educated guess that it’s 99.

And since I’m used to writing algorithms, and teaching dumb computers to solve problems, I used a process that didn’t make use of any educated guesses! And thus took far many more steps to get to the answer.

When the numbers don’t follow a uniform distribution, binary search works differently. You don’t start with the middle number – instead, you start with the weighted median of all the numbers! And then go on to the weighted median of whichever half you end up in. And so on and so forth until you find the number in the counterparty’s mind. That is the most optimal algo.

Then again, how do you figure out what the prior distribution of numbers is? For that, I guess knowing some psychology helps.

Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not).

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

Pirate organisations

It’s over 20 years now since I took a “core elective” (yeah, the contradiction!) in IIT on “design and analysis of algorithms”. It was a stellar course, full of highly interesting assignments and quotable quotes. The highlight of the course was a “2 pm onwards” mid term examination, where we could take as much time as we wanted.

Anyway, the relevance of that course to this discussion is one of the problems in our first assignment. It was a puzzle .

It has to do with a large number of pirates who have chanced upon a number of gold coins. There is a strict rank ordering of pirates from most to least powerful (1 to N, with 1 being the most powerful). The problem is about how to distribute the coins among the pirates.

Pirate 1 proposes a split. If at least half the pirates (including himself) vote in favour of the split, the split is accepted and everyone goes home. If (strictly) more than half vote against the split, the pirate is thrown overboard and Pirate 2 proposes a split. This goes on until the split has been accepted. Assuming all the pirates are perfectly rational, how would you split the coins if you were Pirate 1? There is a Wikipedia page on it.

I won’t go into the logic here, but the winning play for Pirate 1 is to give 1 coin to each of the other odd numbered pirates, and keep the rest for himself. If he fails to do so and gets thrown overboard, the optimal solution for Pirate 2 is to give 1 coin to each of the other even numbered pirates, and keep the rest for himself.

So basically you see that this kind of a game structure implies that all odd numbered pirates form a coalition, and all the even numbered pirates form another. It’s like if you were to paint all pirates in one coalition black, you would get a perfectly striped structure.

Now, this kind of a “alternating coalition” can sometimes occur in corporate settings as well. Let us stick to just one path in the org chart, down to the lowest level of employee (so no “uncles” (in a tree sense) in the mix).

Let’s say you are having trouble with your boss and are unable to prevail upon her for some reason. Getting the support of your peers is futile in this effort. So what do you do? You go to your boss’s boss and try to get that person onside, and together you can take on your boss. This can occasionally be winning.

Similarly, let us say you seek to undermine (in the literal sense) one of your underlings who is being troublesome. What do you do? You ally with one of their underlings, to try and prevail upon your underling. Let’s say your boss and your underling have thought similarly to you – they will then ally to try and take you down.

Now see what this looks like – your boss’s boss, you and your underling’s underling are broadly allied. Your boss and your underling (and maybe your underling’s underling’s underling) are broadly allied. So it is like the pirate problem yet again, with people alternate in the hierarchy allying with each other!

Then again, in organisations, alliances and rivalries are never permanent. For each piece of work that you seek to achieve, you do what it takes and ally with the necessary people to finish it. And so, in the broad scheme of all alliances that happen, this “pirate structure” is pretty rare. And so it hasn’t been studied well enough.

PS: I was wondering recently why people don’t offer training programs in “corporate game theory”. The problem, I guess, is that no HR or L&D person will sponsor it – there is no point in having everyone in your org being trained in the same kind of game theory – they will nullify each other and the training will do down the drain.

I suppose this is why you have leadership coaches – who are hired by individual employees to navigate the corporate games.

Management and Verification

For those of you who are new here, my wife and I used to organise “NED Talks” in our home in Bangalore. The first edition happened in 2015 (organised on a whim), and encouraged by its success, we organised 10 more editions until 2019. We have put up snippets of some talks here.

In the second edition of the NED Talks (February 2015), we had a talk by V Vinay (noted computer scientist, former IISc professor, co-inventor of Simputer, co-founder of Strand Life Sciences, Ati Motors, etc. etc.), where he spoke about “computational complexity”.

Now, having studied computer science, “computational complexity” was not a new topic to me, but one thing that Vinay said has stayed with me – it is that verifying an algorithm is far more efficient than actually executing the algorithm.

To take a simple example, factorising a number into prime factors is NP Hard – in other words, it is a really hard problem. However, verifying the prime factorisation of a number is trivial – you can just multiply the factors and see if it gives back the number you started with.

I was thinking about this paradigm the ohter day when I was thinking about professional managers – several times in life I have wondered “how can this person manage this function when he/she has no experience in that function?”. Maybe it is because I had been subjected to two semesters of workshop in the beginning of my engineering, but I have intuitively assumed that you can only manage stuff that you have personally done – especially if it is a non-trivial / specialist role.

But then – if you think about it, at some level, management is basically about “verification”. To see whether you have done your work properly, I don’t need to precisely know how you have done it. All I need to know is whether you have done bullshit – which means, I don’t need to “replicate your algorithm”. I only need to “verify your algorithm”, which computer science tells us can be an order of magnitude simpler than actually building the algorithm.

The corollary of this is that if you have managed X, you need not be good at X, or actually even have done X. All it shows is that you know how to manage X, which can be an order of magnitude simple than actually doing X.

This also (rather belatedly) explains why I have largely been wary of hiring “pure managers” for my team. Unless they have been hands on at their work, I start wondering if they actually know how to do it, or only know how to manage it (and I’m rather hands on, and only hire hands on people).

And yet another corollary is that if you have spent too long just managing teams, you might have gotten so used to just verifying algorithms that you can’t write algorithms any more.

And yet another before I finish – computer science has a lot of lessons to offer life.

Jio, Amazon and Information Content

A long long time ago I had installed the Jio Cinema app on my Fire TV Stick. I had perhaps watched two movies on that, and then completely forgotten about it. This evening, I had to look for a movie to watch my the wife, and having exhausted most of the “compatible content” (stuff we can watch together on Netflix) and been exhausted by the user experience on Prime Video, I decided to give this app a try.

I ended up selecting a movie, which I later found out has a 4.5 IMDB rating and doesn’t even have a Wikepedia page. Needless to say, we abandoned the movie midway. That’s when the wife went in to put the daughter to bed and my fun began.

So Jio Cinema follows what I call the “Amazon paradigm for product management”. Since Amazon tries to sell every product (or service) as if it is a physical book, it has one single mantra for product management. “Improve selection and they will come”.

The user experience doesn’t matter. How easy the product is to use, and how pleasing it looks on the eye, and whether it has occasional bugs, is all secondary. All that matters is selection. Given that the company built its business on the back of selling “long tail” books, this is not so surprising, except that it doesn’t necessarily work in other categories.

I’ve written about Amazon’s ineptitude in product management before, in the context of that atrocity of an app called Sony Liv. The funny thing is that the Jio Cinema app (on Fire TV Stick) looks and feels pretty much exactly like Sony Liv. Maybe there is an open source shitty fire TV app that these guys have based themselves on?

In any case, I started browsing the Jio Cinema app, and I found something called “movies in 15 minutes“. Initially I thought it was a parody. The first few movies I noticed there were things I had never heard of. “This is perhaps for bad movies”, I reasoned. I kept scrolling, and more recognisable names popped up.

I decided to watch Deewana, which was released just before the start of my optimal age of movie appreciation, and which, for some reason, we didn’t get home a video cassette of.

It’s basically a collage of scenes from the movie. It’s like someone has put together a “highlights package”, taking all the important scenes and then putting them together.

And for a movie like Deewana it works. The 15 minute version had all the necessary plot elements to fully follow the movie. It is a great movie, for 15 minutes. Maybe at 30 minutes as well it might be a great movie. However, I can’t imagine having watched it in the full version.

That was two hours back. I’ve since gone crazy watching 15 minute versions of many other movies (mostly from the 70s and 80s, though they have movies as recent as Jab We Met). It’s been fantastic.

However, I have one crib. This has to do with information content. Essentially, the premise behind “movies in 15 minutes” is that the information content in these movies is so little that the whole thing can be compressed into 15 minutes. The problem is that not every movie has the same amount of information.

15 minutes was perfect for Deewana. It was also appropriate for Kasam Paida Karne Waali Ki, which I watched only because it gets referenced in Gangs of Wasseypur. Between these two, I “watched” Namak Halaal, and I didn’t understand the head or tail of it. I had to go to Wikepedia to understand the plot.

Essentially the plot of Namak Halaal is complex enough, I imagine, that compressing it into 15 minutes is impossible without significant information loss. And the loss of information was so much that I couldn’t understand the summary at all. Maybe I’ll watch the movie in full some day.

I’m writing this blogpost after watching the 15 minute version of Don. I guess whoever made the summary realised that the movie is so complex that it can’t really be compressed into 15 minutes – and so they have added a voiceover to narrate the key elements.

In any case, I’m feeling super thrilled. I normally don’t watch movies because the bit rate in most movies is too low. Compression means that I can happily watch the movies without ever getting bored.

I wish they made these 15 minute versions of all movies. Jio, all (your Amazon-style product maangement) is forgiven.

Now on to Amar Akbar Anthony.

Java and IIT Madras

At the end of my B.Tech. in Computer Science and Engineering from IIT Madras, I was very clear about one thing – I didn’t want to be an engineer. I didn’t want to pursue a career in Computer Science, either. This was after entering IIT with a reputation of being a “stud programmer”, and being cocky and telling people that my hobby was “programming”.

I must have written about this enough times on this blog that I can’t be bothered about finding links, but my Computer Science degree at IIT Madras made me hate programming. I didn’t mind (some of) the maths, but it was the actual coding bit that I actively came to hate. And when an internship told me that research wasn’t something I was going to be good at, fleeing the field was an obvious decision and I quickly went to business school.

Thinking back about it, I think my problem is that I give up when faced with steep learning curves. I like systems that are easy and intuitive to use, and have a great user experience. The “geeky” products that are difficult to use and geeks take pride in, I have no patience for. I remember learning to code macros in Microsoft Excel in my first post B-school job in 2006 being the time when I started falling in love with Computer Science once again.

The big problem with CSE in IIT Madras was that they made you code. A lot. Which you might think is totally normal for a program in computer science. Except that all the professors there were perhaps like me, and wanted systems that were easy to use, which means that just about anything we needed to build, we needed to build a user interface for. And in 2002, that meant coding in Java, and producing those ugly applets which were interactive but anything but easy to use.

The amount of Java coding I did in those four years is not funny. And Java is a difficult language to code – it’s incredibly verbose and complicated (especially compared to something like Python, for example), and impossible to code without a book or a dictionary of APIs handy. And because it’s so verbose, it’s buggy. And you find it difficult to make things work. And even when you make it work, the UI that it produces is incredibly ugly.

So it amused me to come across this piece of news that my old department has “developed a new framework that could make the programs written in JAVA language more efficient“. I don’t know who uses Java any more (I thought the language of choice among computer scientists nowadays is Python. While it’s infinitely easier than Java, it again produces really ugly graphics), but it’s interesting that people in my old department are still at it. And even going about making things more efficient!

Also, you might find the article itself (this is on the IIT alumni website) amusing. Go ahead and give it a read.

To solve this problem, V Krishna and Manas Thakur tweaked the two compilation procedures. In the first compilation step, more elaborate and time-consuming analysis is performed and wherever the conversion stalls due to unavailability of the library from the computer, a partial result is created. Now, during the second stage of compilation, the just in-time compilers, with available libraries from the computer, work to resolve the partial values to generate final values and finally a more precise result. As the time taken during the first exhaustive compilation does not get included in execution time, the whole procedure still remains time-saving, while leading to highly efficient codes

Hypothesis Testing in Monte Carlo

I find it incredible, and not in a good way, that I took fourteen years to make the connection between two concepts I learnt barely a year apart.

In August-September 2003, I was auditing an advanced (graduate) course on Advanced Algorithms, where we learnt about randomised algorithms (I soon stopped auditing since the maths got heavy). And one important class of randomised algorithms is what is known as “Monte Carlo Algorithms”. Not to be confused with Monte Carlo Simulations, these are randomised algorithms that give a one way result. So, using the most prominent example of such an algorithm, you can ask “is this number prime?” and the answer to that can be either “maybe” or “no”.

The randomised algorithm can never conclusively answer “yes” to the primality question. If the algorithm can find a prime factor of the number, it answers “no” (this is conclusive). Otherwise it returns “maybe”. So the way you “conclude” that a number is prime is by running the test a large number of times. Each run reduces the probability that it is a “no” (since they’re all independent evaluations of “maybe”), and when the probability of “no” is low enough, you “think” it’s a “yes”. You might like this old post of mine regarding Monte Carlo algorithms in the context of romantic relationships.

Less than a year later, in July 2004, as part of a basic course in statistics, I learnt about hypothesis testing. Now (I’m kicking myself for failing to see the similarity then), the main principle of hypothesis testing is that you can never “accept a hypothesis”. You either reject a hypothesis or “fail to reject” it. And if you fail to reject a hypothesis with a certain high probability (basically with more data, which implies more independent evaluations that don’t say “reject”), you will start thinking about “accept”.

Basically hypothesis testing is a one-sided test, where you are trying to reject a hypothesis. And not being able to reject a hypothesis doesn’t mean we necessarily accept it – there is still the chance of going wrong if we were to accept it (this is where we get into messy territory such as p-values). And this is exactly like Monte Carlo algorithms – one-sided algorithms where we can only conclusively take a decision one way.

So I was thinking of these concepts when I came across this headline in ESPNCricinfo yesterday that said “Rahul Johri not found guilty” (not linking since Cricinfo has since changed the headline). The choice, or rather ordering, of words was interesting. “Not found guilty”, it said, rather than the usual “found not guilty”.

This is again a concept of one-sided testing. An investigation can either find someone guilty or it fails to do so, and the heading in this case suggested that the latter had happened. And as a deliberate choice, it became apparent why the headline was constructed this way – later it emerged that the decision to clear Rahul Johri of sexual harassment charges was a contentious one.

In most cases, when someone is “found not guilty” following an investigation, it usually suggests that the evidence on hand was enough to say that the chance of the person being guilty was rather low. The phrase “not found guilty”, on the other hand, says that one test failed to reject the hypothesis, but it didn’t have sufficient confidence to clear the person of guilt.

So due credit to the Cricinfo copywriters, and due debit to the product managers for later changing the headline rather than putting a fresh follow-up piece.

PS: The discussion following my tweet on the topic threw up one very interesting insight – such as Scotland having had a “not proven” verdict in the past for such cases (you can trust DD for coming up with such gems).