I’m not a data scientist

After a little over four years of trying to ride a buzzword wave, I hereby formally cease to call myself a data scientist. There are some ongoing assignments where that term is used to refer to me, and that usage will continue, but going forward I’m not marketing myself as a “data scientist”, and will not use the phrase “data science” to describe my work.

The basic problem is that over time the term has come to mean something rather specific, and that doesn’t represent me and what I do at all. So why did I go through this long journey of calling myself a “data scientist”, trying to fit in in the “data science community” and now exiting?

It all started with a need to easily describe what I do.

To recall, my last proper full-time job was as a Quant at a leading investment bank, when I got this idea that rather than building obscure models for trading obscure corner cases, I might as well use use my model-building skills to solve “real problems” in other industries which were back then not as well served by quants.

So I started calling myself a “Quant consultant”, except that nobody really knew what “quant” meant. I got variously described as a “technologist” and a “statistician” and “data monkey” and what not, none of which really captured what I was actually doing – using data and building models to help companies improve their businesses.

And then “data science” happened. I forget where I first came across this term, but I had been primed for it by reading Hal Varian saying that the “sexiest job in the next ten years will be statisticians”. I must mention that I had never come across the original post by DJ Patil and Thomas Davenport (that introduces the term) until I looked for it for my newsletter last year.

All I saw was “data” and “science”. I used data in my work, and I tried to bring science into the way my clients thought. And by 2014, Data Science had started becoming a thing. And I decided to ride the wave.

Now, data science has always been what artificial intelligence pioneer Marvin Minsky called a “suitcase term” – words or phrases that mean different things to different people (I heard about the concept first from this brilliant article on the “seven deadly sins of AI predictions“).

For some people, as long as some data is involved, and you do something remotely scientific it is data science. For others, it is about the use of sophisticated methods on data in order to extract insights. Some others conflate data science with statistics. For some others, only “machine learning” (another suitcase term!) is data science. And in the job market, “data scientist” can sometimes be interpreted as “glorified Python programmer”.

And right from inception, there were the data science jokes, like this one:

It is pertinent to put a whole list of it here.

‘Data Scientist’ is a Data Analyst who lives in California”
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
“A data scientist is a business analyst who lives in New York.”
“A data scientist is a statistician who lives in San Francisco.”
“Data Science is statistics on a Mac.”

I loved these jokes, and thought I had found this term that had rather accurately described me. Except that it didn’t.

The thing with suitcase terms is that they evolve over time, as they start getting used differentially in different contexts. And so it was with data science. Over time, it has been used in a dominant fashion by people who mean it in the “machine learning” sense of the term. In fact, in most circles, the defining features of data scientists is the ability to write code in python, and to use the scikit learn package – neither of which is my distinguishing feature.

While this dissociation with the phrase “data science” has been coming for a long time (especially after my disastrous experience in the London job market in 2017), the final triggers I guess were a series of posts I wrote on LinkedIn in August/September this year.

The good thing about writing is that it helps you clarify your mind, and as I ranted about what I think data science should be, I realised over time that what I have in mind as “data science” is very different from what the broad market has in mind as “data science”. As per the market definition, just doing science with data isn’t data science any more – instead it is defined rather narrowly as a part of the software engineering stack where problems are solved based on building machine learning models that take data as input.

So it is prudent that I stop using the phrase “data science” and “data scientist” to describe myself and the work that I do.

PS: My newsletter will continue to be called “the art of data science”. The name gets “grandfathered” along with other ongoing assignments where I use the term “data science”.