Elegant and practical solutions

There are two ways in which you can tie a shoelace – one is the “ordinary method”, where you explicitly make the loops around both ends of the lace before tying together to form a bow. The other is the “elegant method” where you only make one loop explicitly, but tie with such great skill that the bow automatically gets formed.

I have never learnt to tie my shoelaces in the latter manner – I suspect my father didn’t know it either, because of which it wasn’t passed on to me. Metaphorically, however, I like to implement such solutions in other aspects.

Having been educated in mathematics, I’m a sucker for “elegant solutions”. I look down upon brute force solutions, which is why I might sometimes spend half an hour writing a script to accomplish a repetitive task that might have otherwise taken 15 minutes. Over the long run, I believe, this elegance will pay off, in terms of scaling easier.

And I suspect I’m not alone in this love for elegance. If the world were only about efficiency, brute force would prevail. That we appreciate things like poetry and music and art and what not means that there is some preference for elegance. And that extends to business solutions as well.

While going for elegance is a useful heuristic, sometimes it can lead to missing the woods for the trees (or missing the random forests for the decision trees if you may will). For there are situations that simply don’t, or won’t, scale, and where elegance will send you on a wild goose chase while a little fighter work will get the job done.

I got reminded of this sometime last week when my wife asked me for some Excel help in some work she was doing. Now, there was a recent article in WSJ which claimed that the “first rule of Microsoft Excel is that you shouldn’t let people know you’re good at it”. However, having taught a university course on spreadsheet modelling, there is no place to hide for me, and people keep coming to me for Excel help (though it helps I don’t work in an office).

So the problem wasn’t a simple one, and I dug around for about half an hour without a solution in sight. And then my wife happened to casually mention that this was a one-time thing. That she had to solve this problem once but didn’t expect to come across it again, so “a little manual work” won’t hurt.

And the problem was solved in two minutes – a minor variation of the requirement was only one formula away (did you know that the latest versions of Excel for Windows offer a “count distinct” function in pivot tables?). Five minutes of fighter work by the wife after that completely solved the problem.

Most data scientists (now that I’m not one!) ¬†typically work in production environments, where the result of their analysis is expressed in code that is run on a repeated basis. This means that data scientists are typically tuned to finding elegant solutions since any manual intervention means that the code is not production-able and scalable.

This can mean finding complicated workarounds in order to “pull the bow of the shoelaces” in order to avoid that little bit of manual effort at the end, so that the whole thing can be automated. And these habits can extend to the occasional work that is not needed to be repeatable and scalable.

And so you have teams spending an inordinate amount of time finding elegant solutions for problems for which easy but non-scalable “solutions exist”.

Elegance is a hard quality to shake off, even when it only hinders you.

I’ll close with a fairytale – a deer looks at its reflection and admires its beautiful anchors and admonishes its own ugly legs. Lion arrives, the ugly legs help the deer run fast, but the beautiful antlers get stuck in a low tree, and the lion catches up.

 

Why data scientists should be comfortable with MS Excel

Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable tasks.

In fact, some data scientists may consider Excel to be “too downmarket” for them to use. At one firm I worked for, I had heard a rumour that using Excel for modelling was a fire-able offence, though I’m glad to report that I flouted this rule without much adverse effect. Yet, in my years as a “data science” and analytics consultant, and having done several modelling jobs before, I think Excel is an extremely necessary tool in a data scientist’s arsenal. There are several reasons for this.

The main one is communication. “Business types” love Excel – they use it for pretty much every official activity (I know of people who write documents in Excel). If you ask for a set of numbers, you are most likely to find it in an Excel sheet. I know of fairly large organisations which use Excel to store and transmit data (admittedly poor usage). And even non-quantitaive business types understand some of the basic quantitative functions thanks to Excel, such as joining (VLookup), pivoting, basic data cleaning (TRIM, VALUE, etc.), averaging, visualisation and sometimes even basic statistics such as correlation and regression.

One of the main problems that organisations face is lack of communication between data scientists and the business side (I mentioned this in a talk I gave last month: video here and slides here). Excel is an excellent middle ground, since it is reasonably quantitative and business people know how to use it.

In fact, in my consulting experience I’ve found that when working with clients, using Excel can make your client (usually a business person) feel more comfortable and involved in the analysis, speeding up the process and significantly improving collaboration. They’ll feel more empowered to intervene, which means they can add value, and they can feel especially happy if you occasionally let them enter some simple quantitative formulae.

The next advantage of Excel is that it puts the numbers out there. A long time back, when I was still doing full time jobs, I was asked to build a forecasting model (using a programming language) and couldn’t get it right for several months. And then on a whim I decided to use Excel, and when I saw the data in front of me, it was clear why the forecasts were so useless – because the data was so random.

Excel also allows you to quickly try things and iterate, again by putting the data and the analysis in front of you. Admittedly, the toolkit available is limited compared to what programming languages or statistical softwares can offer, but through clever usage (especially with Visual Basic), there is a lot you can achieve.

Then, Excel sometimes nudges you towards finding simple solutions. It is possible when you’re using a programming language to veer towards overly complicated solutions, and possibly use the proverbial nuclear weapon against the sparrow.

When I was working on the forecasting work a decade ago, I found that the forecasts would feed into a fairly complicated-looking model that had been developed over several years by several developers. On a whim, I decided to “do more” in Excel and managed to replicate the entire model in Excel (using VB and Solver). The people leading the product weren’t particularly happy, but using Excel was critical in ultimately moving to a simpler solution.

A similar thing occurred recently as well. I had been building a fairly complex optimisation model, which I tried replicating in Excel for communication purposes (so I could work on it together with the client). And it turned out there was a far simpler solution that I had missed all this time, and the simpler solution became apparent only because I used Excel.

I’m sure this is not an exhaustive list. So, if you’re a data scientist, you will do well to be at least conversant with Excel. I know it may only serve limited needs in terms of analysis, but the effort in learning ¬†will get more than compensated for in the communication and collaboration and simplicity.

Tailpiece:
A long time ago, a co-worker passed by my desk and saw me work on Excel. He saw my spreadsheet and remarked, “oh, so many numbers! it must be very complicated” and went on his way. I don’t know if he is a data scientist now.