Hill Climbing in real life

Fifteen years back, I enrolled for a course on Artificial Intelligence as part of my B.Tech. programme at IIT Madras. It was well before stuff like “machine learning” and “data science” became big, and the course was mostly devoted to heuristics. Incidentally, that term, we had to pick between this course and one on Artificial Neural Networks (I guess nowadays that one is more popular given the hype about Deep Learning?), which meant that I didn’t learn about neural networks until last year or so.

A little googling tells me that Deepak Khemani, who taught us AI in 2002, has put up his lectures online, as part of the NPTEL programme. The first one is here:

In fact, the whole course is available here.

Anyways, one of the classes of problems we dealt with in the course was “search”. Basically, how does a computer “search” for the solution to a problem within a large “search space”?

One of the simplest heuristic is what has come to be known as “hill climbing” (too lazy to look through all of Khemani’s lectures and find where he’s spoken about this). I love computer science because a lot of computer scientists like to describe ideas in terms of intuitive metaphors. Hill climbing is definitely one such!

Let me explain it from the point of view of my weekend vacation in Edinburgh. One of my friends who had lived there a long time back recommended that I hike up this volcanic hill in the city called “Arthur’s Peak“.

On Saturday evening, I left my wife and daughter and wife’s parents (who I had travelled with) in our AirBnB and walked across town (some 3-4 km) to reach Holyrood Palace, from where Arthur’s Seat became visible. This is what I saw: 

Basically, what you see is the side of a hill, and if you see closely, there are people walking up the sides. So what you guess is that you need to make your way to the bottom of the hill and then just climb.

But then you make your way to the base of the hill and see several paths leading up. Which one do you take? You take the path that seems steepest, believing that’s the one that will take you to the top quickest. And so you take a step along that path. And then see which direction to go to climb up steepest. Take another step. Rinse. Repeat. Until you reach a point where you can no longer find a way up. Hopefully that’s the peak.

Most of the time, you are likely to end up on the top of a smaller rock. In any case, this is the hill climbing algorithm.

So back to my story. I reached the base of the hill and set off on the steepest marked path.

I puffed and panted, but I kept going. It was rather windy that day, and it was threatening to rain. I held my folded umbrella and camera tight, and went on. I got beautiful views of Edinburgh city, and captured some of them on camera. And after a while, I got tired, and decided to call my wife using Facetime.

In any case, it appeared that I had a long way to go, given the rocks that went upwards just to my left (I was using a modified version of hill climbing in that I used only marked paths. As I was to rediscover the following day, I have a fear of heights). And I told that to my wife. And then suddenly the climb got easier. And before I knew it I was descending. And soon enough I was at the bottom all over again!

And then I saw the peak. Basically what I had been climbing all along was not the main hill at all! It was a “side hill”, which I later learnt is called the “Salisbury Crags”. I got down to the middle of the two hills, and stared at the valley there. I realised that was a “saddle point”, and hungry and tired and not wanting to get soaked in rain, I made my way out, hailed a cab and went home.

I wasn’t done yet. Determined to climb the “real peak”, I returned the next morning. Again I walked all the way to the base of the hill, and started my climb at the saddle point. It was a tough climb – while there were rough steps in some places, in others there was none. I kept climbing a few steps at a time, taking short breaks.

One such break happened to be too long, though, and gave me enough time to look down and feel scared. For a long time now I’ve had a massive fear of heights. Panic hit. I was afraid of going too close to the edge and falling off the hill. I decided to play it safe and turn back.

I came down and walked across the valley you see in the last picture above. Energised, I had another go. From what was possibly a relatively easier direction. But I was too tired. And I had to get back to the apartment and check out that morning. So I gave up once again.

I still have unfinished business in Edinburgh!

 

Genetic Algorithms

I first learnt about Genetic Algorithms almost exactly thirteen years ago, when it was taught to me by Prof. Deepak Khemani as part of a course on “artificial intelligence”. I remember liking the course a fair bit, and took a liking to the heuristics and “approximate solutions” after the mathematically intensive algorithms course of the previous semester.

The problem with the course, however, was that it didn’t require us to code the algorithms we had learnt (for which we were extremely thankful back then, since in term 5 of Computer Science at IIT Madras, this was pretty much the only course that didn’t involve too many programming assignments).

As a result, while I had learnt all these heuristics (hill climbing, simulated annealing, taboo search, genetic algorithms, etc.) fairly well in theory, I had been at a complete loss as to how to implement any of them. And so far, during the course of my work, I had never had an opportunity to use any of these techniques. Until today that is.

I can’t describe the problem here since this is a client assignment, but when I had to pick a subset from a large set that satisfied certain properties, I knew I needed a method that would reach the best subset quickly. A “fitness function” quickly came to mind and it was obvious that I should use genetic algorithms to solve the problem.

The key with using genetic algorithms is that you need to be able to code the solution in the form of a string, and then define functions such as “crossover” and “mutation”. Given that I was looking for a subset, coding it as a string was rather easy, and since I had unordered subsets, the crossover was also easy – basic random number generation. Within fifteen minutes of deciding I should use GA, the entire algorithm was in my head. It was only a question of implementing it.

As I started writing the code, I started getting fantasies of being able to finish it in an hour and then write a blog post about it. As it happened, it took much longer. The first cut took some three hours (including some breaks), and it wasn’t particularly good, and was slow to converge.  I tweaked things around a bit but things didn’t improve by much.

And that was when I realise that I had done the crossover wrong – when two strings have elements in common and need to be crossed over, I had to take care that elements in common did not repeat into the same “child” (needed the subsets to be of a certain length). So that needed some twist in the code. That done, the code still seemed inefficient.

I had been doing the crossover wrong. If I started off with 10 strings, I would form 5 pairs from them (each participating in exactly one pair) which would result in 10 new strings. And then I would put these 20 (original 10 and new 10) strings through a fitness test and discard the weakest 10. And iterate. The problem was that the strongest strings had as much of a chance of reproducing as the weakest. This was clearly not right.

So I tweaked the code so that the fitter strings had a higher chance of reproducing than the less fit. This required me to put the fitness test at the beginning of each iteration rather than the end. I had to refactor the code a little bit to make sure I didn’t repeat computations. Now I was drawing pairs of strings from the original “basket” and randomly crossing them over. And putting them through the fitness test. And so forth.

I’m extremely happy with the results of the algorithm. I’ve got just the kind of output that I had expected. More importantly, I was extremely happy with the process of coding the whole thing in. I did the entire coding in R, which is what I use for my data analysis (data size meant I didn’t need anything quicker).

The more interesting part is that this only solved a very small part of the problem I’m trying to solve for my client. Tomorrow I’m back to solving a different part of the problem. Genetic algorithms have served their purpose. Back when I started this assignment I had no clue I would be using genetic algorithms. In fact, I had no clue what techniques I might use.

Which is why I get annoyed when people ask me what kind of techniques I use in my problem solving. Given the kind of problems I take on, most will involve a large variety of math, CS and statistics techniques, each of which will only play a small role in the entire solution. This is also the reason I get annoyed when people put methods they are going to use to solve the problem on their pitch-decks. To me, that gives an impression that they are solving a toy version of the problem and not the real problem – or that the consultants are grossly oversimplifying the problem to be solved.

PS: Much as some people might describe it that way, I wouldn’t describe Genetic Algorithms as “machine learning”. I think there’s way too much intervention on the part of the programmer for it to be described thus.