Black Box Models

A few years ago, Felix Salmon wrote this article in Wired called “The Formula That Killed Wall Street“. It was about a formula called “Gaussian Copula”, which was a formula for estimating the joint probability of a set of events happening, if you knew the individual probabilities. It was a mathematical breakthrough.

Unfortunately, it fell into the hands of quants and traders who didn’t fully understand it, and they used it to derive joint probabilities of a large number of instruments put together. What they did not realize was that there was an error in the model (as there is in all models), and when they used the formula to tie up a large number of instruments, this error cascaded, resulting in an extremely inaccurate model, and subsequent massive losses (the last paragraph is based on my reading of the situation. Your mileage might vary).

In a blog post earlier this week at Reuters, Salmon returned to this article. He said:

 And you can’t take technology further than its natural limits, either. It wasn’t really the Gaussian copula function which killed Wall Street, nor was it the quants who wielded it. Rather, it was the quants’ managers — the people whose limited understanding of copula functions and value-at-risk calculations allowed far too much risk to be pushed out into the tails. On Wall Street, just as in the rest of industry, a little bit of common sense can go a very long way.

I’m completely with him on this one. This blog post was in reference to Salmon’s latest article in Wired, which is about the four stages in which quants disrupt industries. You are encouraged to read both the Wired article and the blog post about it.

The essence is that it is easy to over-do analytics. Once you have a model that works in a few cases, you will end up putting too much faith into the model, and soon the model will become gospel, and you will build the rest of the organization around the model (this is Stage Three that Salmon talks about). For example, a friend who is a management consultant once mentioned about how bank lending practices are now increasingly formula driven. He mentioned reading a manager’s report that said “I know the applicant well, and am confident that he will repay the loan. However, our scoring system ranks him too low, hence I’m unable to offer the loan“.

The key issue, as Salmon mentions in his blog post, is that managers need to have at least a basic understanding of analytics (I had touched upon this issue in an earlier blog post). As I had written in that blog post, there can be two ways in which the analytics team can end up not contributing to the firm – firstly, people think they are geeks who nobody understands, and ignores them. Secondly, and perhaps more dangerously, people think of the analytics guys as gods, and fail to challenge them sufficiently, thus putting too much faith in models.

From this perspective, it is important for the analytics team to communicate well with the other managers – to explain the basic logic behind the models, so that the managers can understand the assumptions and limitations, and can use the models in the intended manner. What usually happens, though, is that after a few attempts when management doesn’t “get” the models, the analytics people resign themselves to using technical jargon and three letter acronyms to bulldoze their models past the managers.

The point of this post, however, is about black box models. Sometimes, you can have people (either analytics professionals or managers) using models without fully understanding them, and their assumptions. This inevitably leads to disaster. A good example of this are the traders and quants who used David Li’s Gaussian Copula, and ended up with horribly wrong models.

In order to prevent this, a good practice would be for the analytics people to be able to explain the model in an intuitive fashion (without using jargon) to the managers, so that they all understand the essence and nuances of the model in question. This, of course, means that you need to employ analytics people who are capable of effectively communicating their ideas, and employ managers who are able to at least understand some basic quant.

Should you have an analytics team?

In an earlier post, I had talked about the importance of business people knowing numbers and numbers people knowing business, and had put in a small advertisement for my consulting services by mentioning that I know both business and numbers and work at their cusp. In this post, I take that further and analyze if it makes sense to have a dedicated analytics team.

Following the data boom, most companies have decided (rightly) that they need to do something to take advantage of all the data that they have and have created dedicated analytics teams. These teams, normally staffed with people from a quantitative or statistical background, with perhaps a few MBAs, is in charge of taking care of all the data the company has along with doing some rudimentary analysis. The question is if having such dedicated teams is effective or if it is better to have numbers-enabled people across the firm.

Having an analytics team makes sense from the point of view of economies of scale. People who are conversant with numbers are hard to come by, and when you find some, it makes sense to put them together and get them to work exclusively on numerical problems. That also ensures collaboration and knowledge sharing and that can have positive externalities.

Then, there is the data aspect. Anyone doing business analytics within a firm needs access to data from all over the firm, and if the firm doesn’t have a centralized data warehouse which houses all its data, one task of each analytics person would be to get together the data that they need for their analysis. Here again, the economies of scale of having an integrated analytics team work. The job of putting together data from multiple parts of the firm is not solved multiple times, and thus the analysts can spend more time on analyzing rather than collecting data.

So far so good. However, writing a while back I had explained that investment banks’ policies of having exclusive quant teams have doomed them to long-term failure. My contention there (including an insider view) was that an exclusive quant team whose only job is to model and which doesn’t have a view of the market can quickly get insular, and can lead to groupthink. People are more likely to solve for problems as defined by their models rather than problems posed by the market. This, I had mentioned can soon lead to a disconnect between the bank’s models and the markets, and ultimately lead to trading losses.

Extending that argument, it works the same way with non-banking firms as well. When you put together a group of numbers people and call them the analytics group, and only give them the job of building models rather than looking at actual business issues, they are likely to get similarly insular and opaque. While initially they might do well, soon they start getting disconnected from the actual business the firm is doing, and soon fall in love with their models. Soon, like the quants at big investment banks, they too will start solving for their models rather than for the actual business, and that prevents the rest of the firm from getting the best out of them.

Then there is the jargon. You say “I fitted a multinomial logistic regression and it gave me a p-value of 0.05 so this model is correct”, the business manager without much clue of numbers can be bulldozed into submission. By talking a language which most of the firm understands you are obscuring yourself, which leads to two responses from the rest. Either they deem the analytics team to be incapable (since they fail to talk the language of business, in which case the purpose of existence of the analytics team may be lost), or they assume the analytics team to be fundamentally superior (thanks to the obscurity in the language), in which case there is the risk of incorrect and possibly inappropriate models being adopted.

I can think of several solutions for this – but irrespective of what solution you ultimately adopt –  whether you go completely centralized or completely distributed or a hybrid like above – the key step in getting the best out of your analytics is to have your senior and senior-middle management team conversant with numbers. By that I don’t mean that they all go for a course in statistics. What I mean is that your middle and senior management should know how to solve problems using numbers. When they see data, they should have the ability to ask the right kind of questions. Irrespective of how the analytics team is placed, as long as you ask them the right kind of questions, you are likely to benefit from their work (assuming basic levels of competence of course). This way, they can remain conversant with the analytics people, and a middle ground can be established so that insights from numbers can actually flow into business.

So here is the plug for this post – shortly I’ll be launching short (1-day) workshops for middle and senior level managers in analytics. Keep watching this space :)

Me, all over the interwebs this week

Firstly, on Tuesday, I got interviewed by this magazine called Information Week. Rather, I had gotten interviewed by them a long time back but the interview appeared on Tuesday. I spoke about the challenges of election forecasting in India and the quality of surveys.

Again on Tuesday, and again on Wednesday, I wrote a pair of articles for Mint analyzing constituencies and parties. On Tuesday, I analyzed constituencies whose representatives have always belonged to ruling parties in the last 4 elections. There are 34 such constituencies. Then on Wednesday I wrote about the influence of states in the Lok Sabha, analyzing the proportion of MPs from each major state that was part of the ruling coalition.

If I had forgotten to mention earlier, I have a deal with Mint that lasts till next October where each month I’m supposed to write 3 articles on election data. You can find all my articles so far here.

Then, today, Pragati published my review of the book Why Nations Fail by Acemoglu and Robinson. More than a book on economics or institutions, it is an awesome history book. Get it.

And in the midst of all this, right here, I wrote a “worky” post about the pros and cons of having a dedicated analytics team.

And if you didn’t notice, this website now has “new clothes”. It was a rather long-pending change and the most important feature of the new layout is that it is “responsive”, and thus looks much better on smartphones. I’ve heard a couple of issues with it already, and do let me know if you have any more issues. And for the first time last night I opened this blog on an iPad and I find that it looks fantabulous, thanks to the OnSwipe plugin I use.