Hill Climbing in real life

Fifteen years back, I enrolled for a course on Artificial Intelligence as part of my B.Tech. programme at IIT Madras. It was well before stuff like “machine learning” and “data science” became big, and the course was mostly devoted to heuristics. Incidentally, that term, we had to pick between this course and one on Artificial Neural Networks (I guess nowadays that one is more popular given the hype about Deep Learning?), which meant that I didn’t learn about neural networks until last year or so.

A little googling tells me that Deepak Khemani, who taught us AI in 2002, has put up his lectures online, as part of the NPTEL programme. The first one is here:

In fact, the whole course is available here.

Anyways, one of the classes of problems we dealt with in the course was “search”. Basically, how does a computer “search” for the solution to a problem within a large “search space”?

One of the simplest heuristic is what has come to be known as “hill climbing” (too lazy to look through all of Khemani’s lectures and find where he’s spoken about this). I love computer science because a lot of computer scientists like to describe ideas in terms of intuitive metaphors. Hill climbing is definitely one such!

Let me explain it from the point of view of my weekend vacation in Edinburgh. One of my friends who had lived there a long time back recommended that I hike up this volcanic hill in the city called “Arthur’s Peak“.

On Saturday evening, I left my wife and daughter and wife’s parents (who I had travelled with) in our AirBnB and walked across town (some 3-4 km) to reach Holyrood Palace, from where Arthur’s Seat became visible. This is what I saw:

Basically, what you see is the side of a hill, and if you see closely, there are people walking up the sides. So what you guess is that you need to make your way to the bottom of the hill and then just climb.

But then you make your way to the base of the hill and see several paths leading up. Which one do you take? You take the path that seems steepest, believing that’s the one that will take you to the top quickest. And so you take a step along that path. And then see which direction to go to climb up steepest. Take another step. Rinse. Repeat. Until you reach a point where you can no longer find a way up. Hopefully that’s the peak.

Most of the time, you are likely to end up on the top of a smaller rock. In any case, this is the hill climbing algorithm.

So back to my story. I reached the base of the hill and set off on the steepest marked path.

I puffed and panted, but I kept going. It was rather windy that day, and it was threatening to rain. I held my folded umbrella and camera tight, and went on. I got beautiful views of Edinburgh city, and captured some of them on camera. And after a while, I got tired, and decided to call my wife using Facetime.

In any case, it appeared that I had a long way to go, given the rocks that went upwards just to my left (I was using a modified version of hill climbing in that I used only marked paths. As I was to rediscover the following day, I have a fear of heights). And I told that to my wife. And then suddenly the climb got easier. And before I knew it I was descending. And soon enough I was at the bottom all over again!

And then I saw the peak. Basically what I had been climbing all along was not the main hill at all! It was a “side hill”, which I later learnt is called the “Salisbury Crags”. I got down to the middle of the two hills, and stared at the valley there. I realised that was a “saddle point”, and hungry and tired and not wanting to get soaked in rain, I made my way out, hailed a cab and went home.

I wasn’t done yet. Determined to climb the “real peak”, I returned the next morning. Again I walked all the way to the base of the hill, and started my climb at the saddle point. It was a tough climb – while there were rough steps in some places, in others there was none. I kept climbing a few steps at a time, taking short breaks.

One such break happened to be too long, though, and gave me enough time to look down and feel scared. For a long time now I’ve had a massive fear of heights. Panic hit. I was afraid of going too close to the edge and falling off the hill. I decided to play it safe and turn back.

I came down and walked across the valley you see in the last picture above. Energised, I had another go. From what was possibly a relatively easier direction. But I was too tired. And I had to get back to the apartment and check out that morning. So I gave up once again.

I still have unfinished business in Edinburgh!

Maths, machine learning, brute force and elegance

Back when I was at the International Maths Olympiad Training Camp in Mumbai in 1999, the biggest insult one could hurl at a peer was to describe the latter’s solution to a problem as being a “brute force solution”. Brute force solutions, which were often ungainly, laboured and unintuitive were supposed to be the last resort, to be used only if one were thoroughly unable to implement an “elegant solution” to the problem.

Mathematicians love and value elegance. While they might be comfortable with esoteric formulae and the Greek alphabet, they are always on the lookout for solutions that are, at least to the trained eye, intuitive to perceive and understand. Among other things, it is the belief that it is much easier to get an intuitive understanding for an elegant solution.

When all the parts of the solution seem to fit so well into each other, with no loose ends, it is far easier to accept the solution as being correct (even if you don’t understand it fully). Brute force solutions, on the other hand, inevitably leave loose ends and appreciating them can be a fairly massive task, even to trained mathematicians.

In the conventional view, though, non-mathematicians don’t have much fondness for elegance. A solution is a solution, and a problem solved is a problem solved.

With the coming of big data and increased computational power, however, the tables are getting turned. In this case, the more mathematical people, who are more likely to appreciate “machine learning” algorithms recommend “leaving it to the system” – to unleash the brute force of computational power at the problem so that the “best model” can be found, and later implemented.

And in this case, it is the “half-blood mathematicians” like me, who are aware of complex algorithms but are unsure of letting the system take over stuff end-to-end, who bat for elegance – to look at data, massage it, analyse it and then find that one simple method or transformation that can throw immense light on the problem, effectively solving it!

The world moves in strange ways.

Schoolkid fights, blockchain and smart contracts

So I’ve been trying to understand the whole blockchain thing better, since people nowadays seem to be wanting to use it for all kinds of contracts (even the investment bankers are taking interest, which suggests there’s some potential out there 😛 ).

One of the things I’ve been doing is to read this book (PDF) on Blockchain by Arvind Narayanan and co at Princeton. It’s an easy to read, yet comprehensive, take on bitcoin and cryptocurrency technologies, the maths behind it and so on.

And as I’ve been reading it, I’ve been developing my own oversimplified model of what blockchain and smart contracts are, and this is my take at explaining it.

Imagine that Alice and Bob are two schoolkids and they’ve entered into a contract which states that if Alice manages to climb a particular tree, Bob will give her a bar of chocolate. Alice duly climbs the tree and claims the chocolate, at which point Bob flatly denies that she climbed it and refuses to give her the chocolate. What is Alice to do?

In the conventional “contract world”, all that Alice can do is to take the contract that she and Bob had signed (assume they had formalised it) and take it to a court of law (a schoolteacher, perhaps, in this case), which will do its best possible in order to determine whether she actually climbed the tree, and then deliver the judgment.

As you may imagine, in the normal schoolkid world, going to a teacher for adjudicating on whether someone climbed a tree (most likely an “illegal” activity by school rules) is not the greatest way to resolve the fight. Instead, either Alice and Bob will try to resolve it by themselves, or call upon their classmates to do the same. This is where the blockchain comes in.

Simply put, in terms of the blockchain “register”, as long as more than half of Alice and Bob’s classmates agree that she climbed the tree, she is considered to have climbed the tree, and Bob will be liable to give her chocolate. In other words, the central “trusted third party” gets replaced by a decentralised crowd of third parties where the majority decision is taken to be the “truth”.

Smart contracts take this one step further. Bob will give the bar of chocolates to the collective trust of his classmates (the adjudicators). And if a majority of them agree that Alice did climb the tree, the chocolate will be automatically given to her. If not, it will come back to Bob. What blockchain technologies allow for is to write code in a clever manner so that this can get executed automatically.

This might be a gross oversimplification, but this is exactly how the blockchain works. Each transaction is considered “valid” and put into the blockchain if a majority of nodes agrees it’s valid. And in order to ensure that this voting doesn’t get rigged, the nodes (or judges) need to perform a difficult computational puzzle in order to be able to vote – this imposes an artificial cost of voting which makes sure that it’s not possible to rig the polls unless you can take over more than half the nodes – and in a global blockchain where you have a really large number of nodes, this is not feasible.

So when you see that someone is building a blockchain based solution for this or that, you might wonder whether it actually makes sense. All you need to do is to come back to this schoolkid problem – for the kind of dispute that is likely to arise from this problem, would the parties prefer to go to a mutually trusted third party, or leave it to the larger peer group to adjudicate? Using the blockchain is a solution if and only if the latter case is true.

Programming back to the 1970s

I learnt to write computer code circa 1998, at a time when resources were plenty. I had a computer of my own – an assembled desktop with a 386 processor and RAM that was measured in MBs. It wasn’t particularly powerful, but it was more than adequate to handle the programs I was trying to write.

I wasn’t trying to process large amounts of data. Even when the algorithms were complex, they weren’t that complex. Most code ran in a matter of minutes, which meant that I didn’t need to bother about getting the code right the first time round – apart from for examination purposes. I could iterate and slowly get things right.

This was markedly different from how people programmed back in the 1970s, when computing resource was scarce and people had to mostly write code on paper. Time had to be booked at computer terminals, when the code would be copied onto the computers, and then run. The amount of time it took for the code to run meant that you had to get it right the first time round. Any mistake meant standing in line at the terminal again, and further time to run  the code.

The problem was particularly dire in the USSR, where the planned economy meant that the shortages of computer resources were shorter. This has been cited as a reason as to why Russian programmers who migrated to the US were prized – they had practice in writing code that worked for the first time.

Anyway, the point of this post is that coding became progressively easier through the second half of the 20th century, when Moore’s Law was in operation, and computers became faster, smaller and significantly more abundant.

This process continues – computers continue to become better and more abundant – smartphones are nothing but computers. On the other side, however, as storage has gotten cheap and data capture has gotten easier, data sources are significantly larger now than they were a decade or two back.

So if you are trying to write code that uses a large amount of data, it means that each run can take a significant amount of time. When the data size reaches big data proportions (when it all can’t be processed on a single computer), the problem is more complex.

And in that sense, every time you want to run a piece of code, however simple it is, execution takes a long time. This has made bugs much more expensive again – the amount of time programs take to run means that you lose a lot of time in debugging and rewriting your code.

It’s like being in the 1970s all over again!

I don’t know if I’ve written about this before (that might explain how I crossed 2000 blogposts last year – multiple posts about the same thing), but anyway – I’m writing this listening to Aerosmith’s Dream On.

I don’t recall when the first time was that I heard the song, but I somehow decided that it sounded like Led Zeppelin. It was before 2006, so I had no access to services such as Shazam to search effectively. So for a long time I continued to believe it was by Led Zep, and kept going through their archives to locate the song.

And then in 2006, Pandora happened. It became my full time work time listening (bless those offshored offices with fast internet and US proxies). I would seed stations with songs I liked (back then there was no option to directly play songs you liked – you could only seed stations). I discovered plenty of awesome music that way.

And then one day I had put on a Led Zeppelin station and started work. The first song was by Led Zeppelin itself. And then came Dream On. And I figured it was a song by Aerosmith. While I chided myself for not having identified the band correctly, I was happy that I hadn’t been that wrong – given that Pandora uses machine learning on song patterns to identify similar songs, that Dream On had appeared in a LedZep playlist meant that I hadn’t been too far off identifying it with that band.

Ten years on, I’m not sure why I thought Dream On was by Led Zeppelin – I don’t see any similarities any more. But maybe the algorithms know better!

Coin change problem with change – Dijkstra’s Algorithm

The coin change problem is a well studied problem in Computer Science, and is a popular example given for teaching students Dynamic Programming. The problem is simple – given an amount and a set of coins, what is the minimum number of coins that can be used to pay that amount?

So, for example, if we have coins for 1,2,5,10,20,50,100 (like we do now in India), the easiest way to pay Rs. 11 is by using two coins – 10 and 1. If you have to pay Rs. 16, you can break it up as 10+5+1 and pay it using three coins.

The problem with the traditional formulation of the coin change problem is that it doesn’t involve “change” – the payer is not allowed to take back coins from the payee. So, for example, if you’ve to pay Rs. 99, you need to use 6 coins (50+20+20+5+2+2). On the other hand, if change is allowed, Rs. 99 can be paid using just 2 coins – pay Rs. 100 and get back Re. 1.

So how do you determine the way to pay using fewest coins when change is allowed? In other words, what happens to the coin change problems when negative coins can be used? (Paying 100 and getting back 1 is the same as paying 100 and (-1) ) .

Unfortunately, dynamic programming doesn’t work in this case, since we cannot process in a linear order. For example, the optimal way to pay 9 rupees when negatives are allowed is to break it up as (+10,-1), and calculating from 0 onwards (as we do in the DP) is not efficient.

For this reason, I’ve used an implementation of Dijkstra’s algorithm to determine the minimum number of coins to be used to pay any amount when cash back is allowed. Each amount is a node in the graph, with an edge between two amounts if the difference in amounts can be paid using a single coin. So there is an edge between 1 and 11 because the difference (10) can be paid using a single coin. Since cash back is allowed, the graph need not be directed.

So all we need to do to determine the way to pay each amount most optimally is to run Dijkstra’s algorithm starting from 0. The breadth first search has complexity \$latex O(M^2 n)\$ where $M$ is the maximum amount we want to pay, while $n$ is the number of coins.

I’ve implemented this algorithm using R, and the code can be found here. I’ve also used the algorithm to compute the number of coins to be used to pay all numbers between 1 and 10000 under different scenarios, and the results of that can be found here.

You can feel free to use this algorithm or code or results in any of your work, but make sure you provide appropriate credit!

PS: I’ve used “coin” here in a generic sense, in that it can mean “note” as well.

The Birthday Party Problem

Next Tuesday is my happy birthday. As of now, I’m not planning to have a party. And based on some deep graph theoretic analysis that the wife and I just did over the last hour, it’s unlikely I will – for forming a coherent set of people to invite is an NP-hard problem, it seems like.

So five birthdays back we had a party, organised by the wife and meant as a surprise to me. On all counts it seemed like a great party. Except that the guests decided to divide themselves into one large clique and one smaller clique (of 2 people), leaving me as the cut vertex trying to bridge these cliques. That meant the onus was on me to make sure the tiny clique felt included in the party, and it wasn’t a lot of fun.

The problem is this – how do you invite a subset of friends for a party so that intervention by the host to keep guests entertained is minimised?

Let’s try and model this. Assume your friends network can be represented by an unweighted undirected graph, with a pair of friends being connected by an edge if they know (and get along with) each other already. Also assume you have full information about this graph (not always necessary).

The problem lies in selecting a subgraph of this graph such that you can be confident that it won’t break into smaller pieces (since that will mean you bonding with each such sub-group), and no guest feels left out (since the onus of making them comfortable will fall on you).

Firstly, the subgraph needs to be connected. Then, we can safely eliminate all guests who have degree of either zero or one (former is obvious, latter since they’ll be too needy on their only friend). In fact, we can impose a condition that each guest should have a minimum degree of two even in the subgraph.

Then we need to impose conditions on a group in the party breaking away. We can assume that for a group of people to break away, they need to be a clique (it is not a robust requirement, since you and someone you find at a party can suddenly decide to find a room, but reasonable enough).

We can also assume that for a group to break away, the strength of their mutual connections should outweigh the strength of their connections to the rest of the group. Since we’re using unweighted graphs here, we can simply assume that a group can break away if the number of edges between this group and the rest of the network is less than the size of the group.

So if there is a group of three who, put together, have two connections to the rest of the group, the group can break away. Similarly, a clique of four will break away from the main group if they have three or less edges going over. And let’s assume that the host is not a part of this subgroup of guests.

Given these constraints, and constraints on party size (minimum and maximum number of guests to invite), how can we identify an appropriate subset of friends to invite for the party? And I’m assuming this problem is NP-Hard (without thinking too much about it) – so can we think of a good heuristic to solve this problem

Do let me know the answer before next Tuesday, else I may not be able to have a party this time as well!