Data Science is a Creative Profession

About a month or so back I had a long telephonic conversation with this guy who runs an offshored analytics/data science company in Bangalore. Like most other companies that are being built in the field of analytics, this follows the software services model – a large team in an offshored location, providing long-term standardised data science solutions to a client in a different “geography”.

As is usual with conversations like this one, we talked about our respective areas of work and kind of projects we take on, and soon we got to the usual bit in such conversations where we were trying to “find synergies”. Things were going swimmingly when this guy remarked that it was the first time he was coming across a freelancer in this profession. “I’ve heard of freelance designers and writers, but never freelance data scientists or analytics professionals”, he mentioned.

In a separate event I was talking to one old friend about another old friend who has set up a one-man company to do provide what is basically freelance consulting services. We reasoned that the reason this guy had set up a company rather than calling himself a freelancer given the reputation that “freelancers” (irrespective of the work they do) have – if you say you are a freelancer people think of someone smoking pot and working in a coffee shop on a Mac. If you say you are a partner or founder of a company, people imagine someone more corporate.

Now that the digression is out of the way let us get back to my conversation with the guy who runs the offshored shop. During the conversation I didn’t say much, just saying things like “what is wrong with being a freelancer in this profession”. But now that i think more about it, it is simply a function of the profession being a fundamentally creative profession.

For a large number of people, data science is simply about statistics, or “machine learning” or predictive modelling – it is about being given a problem expressed in statistical terms and finding the best possible model and model parameters for it. It is about being given a statistical problem and finding a statistical solution – I’m not saying, of course, that statistical modelling is not a creative profession – there is a fair bit of creativity involved in figuring out what kind of model to model, and picking the right model for the right data. But when you have a large team working on the problem, working effectively like an assembly line (with different people handling different parts of the solution), what you get is effectively an “assembly line solution”.

Coming back, let us look at this “a day in the life” post I wrote about a year back about a particular day in office for me. I’ve detailed in that the various kinds of problems I had to solve that day – hidden markov models and bayesian probability to writing code using dynamic programming and implementing the code in R, and then translating the solution back to the business context. Notice that when I started off working on the problem it was not known what domain the problem belonged in – it took some poking and prodding around in order to figure out the nature of the problem and the first step in solution.

And then on, it was one step leading to another, and there are two important facts to consider about each step – firstly, at each step, it wasn’t clear as to what the best class of technique was to get beyond the step – it was about exploration in order to figure out the best class of technique. Next, at no point in time was it known what the next step was going to be until the current step was solved. You can see that it is hard to do it in an assembly line fashion!

Now, you can talk about it being like a game of chess where you aren’t sure what the opponent will do, but then in chess the opponent is a rational human being, while here the “opponent” is basically the data and the patterns it shows, and there is no way to know until you try something as to how the data will react to that. So it is impossible to list out all steps beforehand and solve it – solution is an exploratory process.

And since solving a “data science problem” (as I define it, of course) is an exploratory, and thus creative, process, it is important to work in an atmosphere that fosters creativity and “thinking without thinking” (basically keep a problem in the back of your mind and then take your mind off it, and distract yourself to solve the problem). This is best done away from a traditional corporate environment – where you have to attend meetings and be liable to be disturbed by colleagues at all times, and this is why a freelance model is actually ideal! A small partnership also works – while you might find it hard to “assembly line” the problem, having someone to bounce thoughts and ideas with can have a positive impact to the creative process. Anything more like a corporate structure and you are removing the conditions necessary to foster creativity, and are in such situations more likely to come up with cookie-cutter solutions.

So unless your business model deals with doing repeatable and continuous analytical work for a client, you are better off organising yourselves in an environment that fosters creativity and not a traditional office kind of structure if you want to solve problems using data science. Then again, your mileage might vary!

The Importance of Discipline

I’ve never been a fan of discipline. I think it is a major constraint and hinders creativity, and puts too many walls within which you need to live your life. Despite constant exhortations by my father, I never wanted to join the army. Hell, I tried my best (successfully) in order to even avoid NCC when I was at IIT. I pride myself on being some sort of a free spirit who isn’t held back by any arbitrary rules that I create for myself to live my life by.

A really nice article that I read today, however, makes me think twice about this stand. So this article is about “decision fatigue” and is not very dissimilar to what I’d read a long time back (again in the NYT) about the Law of Conservation of Willpower. So this article talks about how every time you need to make a decision it consumes some part of your mental energy. Irrespective of the size of the decision that is to be made, there is some willpower that is lost, and that causes you to be suboptimal in your decision making as the day progresses.

The article really struck a chord with me, and I realize I’m also heavily prone to decision fatigue. Sometimes the smallest decisions take away so much energy from me that I simply put NED. And yeah, on a related note, I’ve got the wife upset innumerable times solely because of my indecisiveness, a part of which can be attributed to decision fatigue. I even remember not going to a wedding reception some three years back because I couldn’t decide which shirt to wear! And no, I’m not making this up.

So on that note, here’s where I think discipline has a part to play in life. By putting certain constraints on your life, you are reducing the number of decisions that you have to make. And that implies your willpower and mental energy will be reserved for those things where it’s really important that you decide carefully. By making a schedule for yourself, you are outsourcing to you-the-planner all the trivial decisions of your life. Yes, you might feel constrained at times. But it saves you so much energy by way of saving you from several trivial decisions.

Of course, feeling constrained can also affect your mental energy in a negative way, and prevent you from giving your best. Nevertheless, this decision fatigue thingy implies that discipline may not be all that bad. Or maybe I need to think about it some more.

Rajkumar Hirani Copycat

Ok this post has nothing to do wtih Five Point Someone or its related controversies. Yeah, the story is inspired by 5PS more than the claimed 3% but I’ll let Chetan Bhagat and his army of followers fight out that battle. Copying from others is honourable, at least you are taking inspiration from someone. What is just not done is copying from oneself. It simply shows lack of creativity and laziness to come up with new ideas.

Maybe when Rajkumar Hirani made 3 Idiots, he assumed that the public would have forgotten Munnabhai MBBS. He assumed that Munnabhai MBBS would be so out of circulation that it would have gone out of people’s minds, eclipsed by the more successful sequel Lage Raho. What he didn’t bargain for was that Munnabhai MBBS was on the menu on the New York JFK  to Dubai Emirates Airlines flight, and that people like me would watch it within 3 weeks of watching 3 idiots.

The similarities are uncanny. Both colleges are “Imperial”, have Boman Irani playing the “big prof” (diro here, dean there), and acting similarly in both. Both have a nerdy Tam who comes 2nd in class, 2nd to the hero. Yeah, Chatur is caricatured in 3I while Swami is given a more positive role in Munnabhai. Both are about the system, about how the larger-than-life hero fights the system and makes the big prof realize that the way he has been running the institution is wrong. The hero’s love interest is the big prof’s daughter. And so on..  Just that Munnabhai and Rancho use different methods to achieve their goals, that’s all.

I suppose most of you would have watched 3Idiots recently. I urge you to pick up a DVD or a torrent of Munnabhai MBBS and watch it, again. And keep an eye out for the similarities. You will be convinced that Rajkumar Hirani is guilty of copying, from his own stuff. It is indeed sad to see a good director such has him stooping to Anu Malik* depths.

While on the topic of 3Idiots, my esteemed colleague Baada wanted me to do a stud-fighter post on the movie. I suppose all of you who have seen the movie will easily figure out why the framework fits. I don’t think it needs any more explanation from the resident stud-fighter expert, that is me. Also, if you recall, I had taken a vow that I won’t do any more stud-fighter blogging. Though I must mention that my book on the topic is going nowhere.

* Listen to the prelude music of Ae Mere Humsafar from Baazigar, and then to the title song of Ishq. Next, listen to the interlude music of Kitaben Bahut Si, again from Baazigar, and then to the title song from Fiza. The self-copy is obvious. And I must mention that I had used this concept in a quiz question, twice. Yeah, I’ve also been guilty of “petering” my own questions.