Segmentation and machine learning

For best results, use machine learning to do customer segmentation, but then get humans with domain knowledge to validate the segments

There are two common ways in which people do customer segmentation. The “traditional” method is to manually define the axes through which the customers will get segmented, and then simply look through the data to find the characteristics and size of each segment.

Then there is the “data science” way of doing it, which is to ignore all intuition, and simply use some method such as K-means clustering and “do gymnastics” with the data and find the clusters.

A quantitative extreme of this method is to do gymnastics with your data, get segments out of it, and quantitatively “take action” on it without really bothering to figure out what each clusters represent. Loosely speaking, this is how a lot of recommendation systems nowadays work – some algorithm somewhere finds people similar to you based on your behaviour, and recommends to you what they liked.

I usually prefer a sort of middle ground. I like to let the algorithms (k-means easily being my favourite) to come up with the segments based on the data, and then have a bunch of humans look at the segments and make sense of it.

Basically whatever segments are thrown up by the algorithm need to be validated by human intuition. Getting counterintuitive clusters is also not a problem – on several occasions, people I’ve validated the clusters by (usually clients) have used the counterintuitive clusters to discover bugs, gaps in the data  or patterns that they didn’t know of earlier.

Also, in terms of validation of clusters, it is always useful to get people with domain knowledge to validate the clusters. And this also means that whatever clusters you’ve generated you are able to represent them in a human-readable format. The best way of doing that is to use the cluster centres and then represent them somehow in a “physical” manner.

I started writing this post some three days ago and am only getting to finish it now. Unfortunately, in the meantime I’ve forgotten the exact motivation of why I started writing this. If i recall that, I’ll maybe do another post.

Taking Intelligence For Granted

There was a point in time when the use of artificial intelligence or machine learning or any other kind of intelligence in a product was a source of competitive advantage and differentiation. Nowadays, however, many people have got so spoiled by the use of intelligence in many products they use that it has become more of a hygiene factor.

Take this morning’s post, for example. One way to look at it is that Spotify with its customisation algorithms and recommendations has spoiled me so much that I find Amazon’s pushing of Indian music irritating (Amazon’s approach can be called as “naive customisation”, where they push Indian music to me only because I’m based in India, and not learn further based on my preferences).

Had I not been exposed to the more intelligent customisation that Spotify offers, I might have found Amazon’s naive customisation interesting. However, Spotify’s degree of customisation has spoilt me so much that Amazon is simply inadequate.

This expectation of intelligence goes beyond product and service classes. When we get used to Spotify recommending music we like based on our preferences, we hold Netflix’s recommendation algorithm to a higher standard. We question why the Flipkart homepage is not customised to us based on our previous shopping. Or why Google Maps doesn’t learn that some of us don’t like driving through small roads when we can help it.

That customers take intelligence for granted nowadays means that businesses have to invest more in offering this intelligence. Easy-to-use data analysis and machine learning packages mean that at least some part of an industry uses intelligence in at least some form (even if they might do it badly in case they fail to throw human intelligence into the mix!).

So if you are in the business of selling to end customers, keep in mind that they are used to seeing intelligence everywhere around them, and whether they state it or not, they expect it from you.

10X Studs and Fighters

Tech twitter, for the last week, has been inundated with unending debate on this tweetstorm by a VC about “10X engineers”. The tweetstorm was engineered by Shekhar Kirani, a Partner at Accel Partners.

I have friends and twitter-followees on both sides of the debate. There isn’t much to describe more about the “paksh” side of the debate. Read Shekhar’s tweetstorm I’ve put above, and you’ll know all there is to this side.

The vipaksh side argues that this normalises “toxicity” and “bad behaviour” among engineers (about “10X engineers”‘s hatred for meetings, and their not adhering to processes etc.). Someone I follow went to the extent to say that this kind of behaviour among engineers is a sign of privilege and lack of empathy.

This is just the gist of the argument. You can just do a search of “10X engineer”, ignore the jokes (most of them are pretty bad) and read people’s actual arguments for and against “10X engineers”.

Regular readers of this blog might be familiar with the “studs and fighters” framework, which I used so often in the 2007-9 period that several people threatened to stop reading me unless I stopped using the framework. I put it on a temporary hiatus and then revived it a couple of years back because I decided it’s too useful a framework to ignore.

One of the fundamental features of the studs and fighters framework is that studs and fighters respectively think that everyone else is like themselves. And this can create problems at the organisational level. I’d spoken about this in the introductory post on the framework.

To me this debate about 10X engineers and whether they are good or bad reminds me of the conflict between studs and fighters. Studs want to work their way. They are really good at what they’re competent at, and absolutely suck at pretty much everything else. So they try to avoid things they’re bad at, can sometimes be individualistic and prefer to work alone, and hope that how good they are at the things they’re good at will compensate for all that they suck elsewhere.

Fighters, on the other hand, are process driven, methodical, patient and sticklers for rules. They believe that output is proportional to input, and that it is impossible for anyone to have a 10X impact, even 1/10th of the time (:P). They believe that everyone needs to “come together as a group and go through a process”.

I can go on but won’t.

So should your organisation employ 10X engineers or not? Do you tolerate the odd “10X engineer” who may not follow company policy and all that in return for their superior contributions? There is no easy answer to this but overall I think companies together will follow a “mixed strategy”.

Some companies will be encouraging of 10X behaviour, and you will see 10X people gravitating towards such companies. Others will dissuade such behaviour and the 10X people there, not seeing any upside, will leave to join the 10X companies (again I’ve written about how you can have “stud organisations” and “fighter organisations”.

Note that it’s difficult to run an organisation with solely 10X people (they’re bad at managing stuff), so organisations that engage 10X people will also employ “fighters” who are cognisant that 10X people exist and know how they should be managed. In fact, being a fighter while recognising and being able to manage 10X behaviour is, I think, an important skill.

As for myself, I don’t like one part of Shekhar Kirani’s definition – that he restricts it to “engineers”. I think the sort of behaviour he describes is present in other fields and skills as well. Some people see the point in that. Others don’t.

Life is a mixed strategy.

Periodicals and Dashboards

The purpose of a dashboard is to give you a live view of what is happening with the system. Take for example the instrument it is named after – the car dashboard. It tells you at the moment what the speed of the car is, along with other indicators such as which lights are on, the engine temperature, fuel levels, etc.

Not all reports, however, need to be dashboards. Some reports can be periodicals. These periodicals don’t tell you what’s happening at a moment, but give you a view of what happened in or at the end of a certain period. Think, for example, of classic periodicals such as newspapers or magazines, in contrast to online newspapers or magazines.

Periodicals tell you the state of a system at a certain point in time, and also give information of what happened to the system in the preceding time. So the financial daily, for example, tells you what the stock market closed at the previous day, and how the market had moved in the preceding day, month, year, etc.

Doing away with metaphors, business reporting can be classified into periodicals and dashboards. And they work exactly like their metaphorical counterparts. Periodical reports are produced periodically and tell you what happened in a certain period or point of time in the past. A good example are company financials – they produce an income statement and balance sheet to respectively describe what happened in a period and at a point in time for the company.

Once a periodical is produced, it is frozen in time for posterity. Another edition will be produced at the end of the next period, but it is a new edition. It adds to the earlier periodical rather than replacing it. Periodicals thus have historical value and because they are preserved they need to be designed more carefully.

Dashboards on the other hand are fleeting, and not usually preserved for posterity. They are on the other hand overwritten. So whether all systems are up this minute doesn’t matter a minute later if you haven’t reacted to the report this minute, and thus ceases to be of importance the next minute (of course there might be some aspects that might be important at the later date, and they will be captured in the next periodical).

When we are designing business reports and other “business intelligence systems” we need to be cognisant of whether we are producing a dashboard or a periodical. The fashion nowadays is to produce everything as a dashboard, perhaps because there are popular dashboarding tools available.

However, dashboards are expensive. For one, they need a constant connection to be maintained to the “system” (database or data warehouse or data lake or whatever other storage unit in the business report sense). Also, by definition they are not stored, and if you need to store then you have to decide upon a frequency of storage which makes it a periodical anyway.

So companies can save significantly on resources (compute and storage) by switching from dashboards (which everyone seems to think in terms of) to periodicals. The key here is to get the frequency of the periodical right – too frequent and people will get bugged. Not frequent enough, and people will get bugged again due to lack of information. Given the tools and technologies at hand, we can even make reports “on demand” (for stuff not used by too many people).

Housewife Careers

This is something I’ve been wanting to write about for a very long time, but have kept putting it off. The ultimate trigger for writing this is this article about women with children in Amazon asking for backup child care at work. Since this hits rather close home, this is a good enough trigger to write.

Quoting the article:

“Everyone wants to act really tough and pretend they don’t have human needs,” says Kristi Coulter, who worked in various roles at Amazon for almost 12 years and observed that many senior executives had stay-at-home wives.

(emphasis mine)

While this might be true of Amazon (though not necessarily for other large tech companies), it is true for other careers as well. The nature of the job means that it is impossible to function if you even have partial child-care responsibilities. And that implies that the only way you can do this job is if you have a spouse whose full time job is bringing up the kids.

Without loss of generality (considering that in most cases it’s the women who give up their careers for child-rearing), we can call these jobs “housewife jobs”.

Housewife jobs are jobs where you can do a good job if an only if you have a spouse who spends all her time taking care of the kids. 

The main feature (I would say it is a bug, but whatever) of such a job is usually long work hours that require you to “overlap both ways” – both leave home early in the morning and return late every night, implying that even if you have to drop your kid to day care, it is your spouse who has to do so. And as I’ve found from personal experience, it is simply not possible to work profitably when you have both child-dropping and child-picking-up duties on a single day (unless you have zero commute, like I’ve had for the last eight months).

Housewife jobs also involve lots of travel. Whether it is overnight or not doesn’t matter, since you are likely to be away early mornings and late evenings at least, and this means (once again) that the spouse has to pick up the slack.

Housewife jobs also involve a lot of pressure, which means that even when you are done with work and want to relax with the kids, you are unable to take your mind off work. So it turns out to be rather unprofitable time with the kids – so you might as well spend that working. Which again means the spouse picks up the slack.

Sometimes a job may not be inherently stressful or require long hours, but might be housewife because the company is led by a bunch of people with housewives (the article linked above claims this about Amazon). What this means is that when there is a sufficient number of (mostly) men in senior management who have housewives taking care of kids, their way of working percolates through the culture of the organisation.

These organisations are more likely to demand “facetime” (not the Apple variety). They are more likely to value input more than output (thus privileging fighter work?). And soon people without housewives get crowded out of such organisations, making it even more housewife organisations.

Finally, you may argue that I’ve used UK-style nurseries as the dominant child care mechanism in my post (these usually run 8-6), and that it might be possible to hedge the situations completely with 24/7 nannies or Singapore-style “helpers”. Now, even with full time child care, there are some emergencies that occur from time to time which require the presence of at least one parent. And it can’t be the same parent providing that presence all the time. So if one of the parents is in a “housewife job”, things don’t really work out.

I guess it is not hard to work out a list of jobs or sectors which are inherently “housewife”. Look at where people quit once they have kids. Look at where people quit once they get married. Look at jobs that are staffed by rolling legions of fresh graduates (if you don’t have a kid, you don’t need a housewife).

The scary realisation I’m coming to is that most jobs are housewife jobs, and it is really not easy being a DI(>=1)K household.

Just Plot It

One of my favourite work stories is from this job I did a long time ago. The task given to me was demand forecasting, and the variable I needed to forecast was so “micro” (this intersection that intersection the other) that forecasting was an absolute nightmare.

A side effect of this has been that I find it impossible to believe that it’s possible to forecast anything at all. Several (reasonably successful) forecasting assignments later, I still dread it when the client tells me that the project in question involves forecasting.

Another side effect is that the utter failure of standard textbook methods in that monster forecasting exercise all those years ago means that I find it impossible to believe that textbook methods work with “real life data”. Textbooks and college assignments are filled with problems that when “twisted” in a particular way easily unravel, like a well-tied tie knot. Industry data and problems are never as clean, and elegance doesn’t always work.

Anyway, coming back to the problem at hand, I had struggled for several months with this monster forecasting problem. Most of this time, I had been using one programming language that everyone else in the company used. The code was simultaneously being applied to lots of different sub-problems, so through the months of struggle I had never bothered to really “look at” the data.

I must have told this story before, when I spoke about why “data scientists” should learn MS Excel. For what I did next was to load the data onto a spreadsheet and start looking at it. And “looking at it” involved graphing it. And the solution, or the lack of it, lay right before my eyes. The data was so damn random that it was a wonder that anything had been forecast at all.

It was also a wonder that the people who had built the larger model (into which my forecasting piece was to plug in) had assumed that this data would be forecast-able at all (I mentioned this to the people who had built the model, and we’ll leave that story for another occasion).

In any case, looking at the data, by putting it in a visualisation, completely changed my perspective on how the problem needed to be tackled. And this has been a learning I haven’t let go of since – the first thing I do when presented with data is to graph it out, and visually inspect it. Any statistics (and any forecasting for sure) comes after that.

Yet, I find that a lot of people simply fail to appreciate the benefits of graphing. That it is not intuitive to do with most programming languages doesn’t help. Incredibly, even Python, a favoured tool of a lot of “data scientists”, doesn’t make graphing easy. Last year when I was forced to use it, I found that it was virtually impossible to create a PDF with lots of graphs – something that I do as a matter of routine when working on R (I subsequently figured out a (rather inelegant) hack the next time I was forced to use Python).

Maybe when you work on data that doesn’t have meaningful variables – such as images, for example – graphing doesn’t help (since a variable on its own has little information). But when the data remotely has some meaning – sales or production or clicks or words, graphing can be of immense help, and can give you massive insight on how to develop your model!

So go ahead, and plot it. And I won’t mind if you fail to thank me later!

Elegant and practical solutions

There are two ways in which you can tie a shoelace – one is the “ordinary method”, where you explicitly make the loops around both ends of the lace before tying together to form a bow. The other is the “elegant method” where you only make one loop explicitly, but tie with such great skill that the bow automatically gets formed.

I have never learnt to tie my shoelaces in the latter manner – I suspect my father didn’t know it either, because of which it wasn’t passed on to me. Metaphorically, however, I like to implement such solutions in other aspects.

Having been educated in mathematics, I’m a sucker for “elegant solutions”. I look down upon brute force solutions, which is why I might sometimes spend half an hour writing a script to accomplish a repetitive task that might have otherwise taken 15 minutes. Over the long run, I believe, this elegance will pay off, in terms of scaling easier.

And I suspect I’m not alone in this love for elegance. If the world were only about efficiency, brute force would prevail. That we appreciate things like poetry and music and art and what not means that there is some preference for elegance. And that extends to business solutions as well.

While going for elegance is a useful heuristic, sometimes it can lead to missing the woods for the trees (or missing the random forests for the decision trees if you may will). For there are situations that simply don’t, or won’t, scale, and where elegance will send you on a wild goose chase while a little fighter work will get the job done.

I got reminded of this sometime last week when my wife asked me for some Excel help in some work she was doing. Now, there was a recent article in WSJ which claimed that the “first rule of Microsoft Excel is that you shouldn’t let people know you’re good at it”. However, having taught a university course on spreadsheet modelling, there is no place to hide for me, and people keep coming to me for Excel help (though it helps I don’t work in an office).

So the problem wasn’t a simple one, and I dug around for about half an hour without a solution in sight. And then my wife happened to casually mention that this was a one-time thing. That she had to solve this problem once but didn’t expect to come across it again, so “a little manual work” won’t hurt.

And the problem was solved in two minutes – a minor variation of the requirement was only one formula away (did you know that the latest versions of Excel for Windows offer a “count distinct” function in pivot tables?). Five minutes of fighter work by the wife after that completely solved the problem.

Most data scientists (now that I’m not one!)  typically work in production environments, where the result of their analysis is expressed in code that is run on a repeated basis. This means that data scientists are typically tuned to finding elegant solutions since any manual intervention means that the code is not production-able and scalable.

This can mean finding complicated workarounds in order to “pull the bow of the shoelaces” in order to avoid that little bit of manual effort at the end, so that the whole thing can be automated. And these habits can extend to the occasional work that is not needed to be repeatable and scalable.

And so you have teams spending an inordinate amount of time finding elegant solutions for problems for which easy but non-scalable “solutions exist”.

Elegance is a hard quality to shake off, even when it only hinders you.

I’ll close with a fairytale – a deer looks at its reflection and admires its beautiful anchors and admonishes its own ugly legs. Lion arrives, the ugly legs help the deer run fast, but the beautiful antlers get stuck in a low tree, and the lion catches up.

 

I’m not a data scientist

After a little over four years of trying to ride a buzzword wave, I hereby formally cease to call myself a data scientist. There are some ongoing assignments where that term is used to refer to me, and that usage will continue, but going forward I’m not marketing myself as a “data scientist”, and will not use the phrase “data science” to describe my work.

The basic problem is that over time the term has come to mean something rather specific, and that doesn’t represent me and what I do at all. So why did I go through this long journey of calling myself a “data scientist”, trying to fit in in the “data science community” and now exiting?

It all started with a need to easily describe what I do.

To recall, my last proper full-time job was as a Quant at a leading investment bank, when I got this idea that rather than building obscure models for trading obscure corner cases, I might as well use use my model-building skills to solve “real problems” in other industries which were back then not as well served by quants.

So I started calling myself a “Quant consultant”, except that nobody really knew what “quant” meant. I got variously described as a “technologist” and a “statistician” and “data monkey” and what not, none of which really captured what I was actually doing – using data and building models to help companies improve their businesses.

And then “data science” happened. I forget where I first came across this term, but I had been primed for it by reading Hal Varian saying that the “sexiest job in the next ten years will be statisticians”. I must mention that I had never come across the original post by DJ Patil and Thomas Davenport (that introduces the term) until I looked for it for my newsletter last year.

All I saw was “data” and “science”. I used data in my work, and I tried to bring science into the way my clients thought. And by 2014, Data Science had started becoming a thing. And I decided to ride the wave.

Now, data science has always been what artificial intelligence pioneer Marvin Minsky called a “suitcase term” – words or phrases that mean different things to different people (I heard about the concept first from this brilliant article on the “seven deadly sins of AI predictions“).

For some people, as long as some data is involved, and you do something remotely scientific it is data science. For others, it is about the use of sophisticated methods on data in order to extract insights. Some others conflate data science with statistics. For some others, only “machine learning” (another suitcase term!) is data science. And in the job market, “data scientist” can sometimes be interpreted as “glorified Python programmer”.

And right from inception, there were the data science jokes, like this one:

It is pertinent to put a whole list of it here.

‘Data Scientist’ is a Data Analyst who lives in California”
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
“A data scientist is a business analyst who lives in New York.”
“A data scientist is a statistician who lives in San Francisco.”
“Data Science is statistics on a Mac.”

I loved these jokes, and thought I had found this term that had rather accurately described me. Except that it didn’t.

The thing with suitcase terms is that they evolve over time, as they start getting used differentially in different contexts. And so it was with data science. Over time, it has been used in a dominant fashion by people who mean it in the “machine learning” sense of the term. In fact, in most circles, the defining features of data scientists is the ability to write code in python, and to use the scikit learn package – neither of which is my distinguishing feature.

While this dissociation with the phrase “data science” has been coming for a long time (especially after my disastrous experience in the London job market in 2017), the final triggers I guess were a series of posts I wrote on LinkedIn in August/September this year.

The good thing about writing is that it helps you clarify your mind, and as I ranted about what I think data science should be, I realised over time that what I have in mind as “data science” is very different from what the broad market has in mind as “data science”. As per the market definition, just doing science with data isn’t data science any more – instead it is defined rather narrowly as a part of the software engineering stack where problems are solved based on building machine learning models that take data as input.

So it is prudent that I stop using the phrase “data science” and “data scientist” to describe myself and the work that I do.

PS: My newsletter will continue to be called “the art of data science”. The name gets “grandfathered” along with other ongoing assignments where I use the term “data science”.

Networking events and positions of strength

This replicates some of the stuff I wrote in a recent blog post, but I put this on LinkedIn and wanted a copy here for posterity 

Having moved my consulting business to London earlier this year, I’ve had a problem with marketing. The basic problem is that while my network and brand is fairly strong in India, I’ve had to start from scratch in the UK.

The lack of branding has meant that I have often had to talk or negotiate from a position of weakness (check out my recent blog post on branding as creating a position of strength). The lack of network has meant that I try to go to networking events where I can meet people and try to improve my network. Except that the lack of branding means that I have to network from a position of weakness and hence not make an impact.

A few months back I came across this set of tweets by AngelList founder Naval Ravikant, in which he talked about productivity hacks.

One that caught my eye, which I try to practice but have not always been able to practice, is on not going to conferences if you are not speaking. However, now that I think about it from the point of view of branding and positions of strength, what he says makes total sense.

In conferences and networking events, there is usually a sort of unspoken hierarchy, where speakers are generally “superior” to those in the audience. This flows from the assumption that the audience has come to gather pearls of wisdom from the speakers. And this has an impact on the networking around the event – if you are speaking, people will start with the prior of your being a superior being, compared to you going as an audience member (especially if it is a paid event).

This is not a strict rule – when there are other people at the event who you know, it is possible that their introductions can elevate you even if you are not speaking. However, if you are at an event where you don’t know anyone else, you surely start on higher ground (no pun intended) in case you are speaking.

There is another advantage that speaking offers – you can use your speech itself to build your brand, which will be fresh in your counterparties’s minds in the networking immediately afterward. Audience members have no such brand-building ability, apart from the possibility of tarnishing their own brands through inappropriate or rambling questions.

So unless you see value in what the speaker(s) say, don’t go to conferences. Putting it another way, don’t go to conferences for networking alone, unless you are speaking. Extending this, don’t go to networking events unless you either know some of the other people who are coming there (whose links you can then tap) or if there is an opportunity for you to elevate your brand at the event (by speaking, for example).

PS: Some of Naval’s other points such as having “meeting days” and scheduling meetings for later in the day are pertinent as well, and I’ve found them to be incredibly useful.

Triangle marketing

This blog post is based more on how I have bought rather than how I have sold. The basic concept is that when you hear about a product or service from two or more independent sources, you are more likely to buy it.

The threshold varies by the kind of product you are looking at. When it is a low touch item like a book, two independent recommendations are enough. When it involves higher cost and has higher impact, like a phone, it might be five recommendations. For something life changing like a keto diet, it might be ten (I must mention I tried keto for half a day and gave up, not least because I figured I don’t really need it – I’m barely 3-4 kg overweight).

The important point to note is that the recommendations need to come from independent sources – if two people who you didn’t expect to have a similar taste in books were to recommend the same book, the second of these recommendations is likely to create an “aha moment” (ok I’m getting into consultant-speak now), and that is likely to drive a purchase (or at least trying a Kindle sample).

In some ways, exposure to the same product through independent sources is likely to create a feeling of a self-fulfilling prophecy. “Alice is also using this. Bob is also using this” will soon go into “everybody seems to be using it. I should also use it”.

So what does this mean to you if you are a seller? Basically you need to hit your target audience through various channels. I had mentioned in my post earlier this week about how branding creates a “position of strength“, and how direct sales is normally hard because it is done through a position of weakness.

The idea is that before you hit your audience with a direct sale, you need to “warm them up” with your brand, and you need to do this through various channels. Your brand needs to impact on your audience through multiple independent channels, so that it has become a self-fulfilling prophecy before you approach to make the sale.

What these precise channels are depends on your business and the product that you’re trying to sell, but the important thing is that they are independent. So for example, putting advertisements in various places won’t help since the target will treat all of them as coming from the same source.

Finally, where is the “triangle” in this marketing? It is in the idea that you complete the branding and sales by means of “triangulation”. You send out vectors in seemingly random directions trying to build your brand, and they will get reflected till a time when they intersect, or “triangulate”. Ok I know my maths here is messy ant not up to my usual standard, but I guess you know what I’m getting at!