Review: The Theory That Would Not Die

I was introduced to Bayes’ Theorem of Conditional Probabilities in a rather innocuous manner back when I was in Standard 12. KVP Raghavan, our math teacher, talked about pulling black and white balls out of three different boxes. “If you select a box at random, draw two balls and find that both are black, what is the probability you selected box one?” , he asked and explained to us the concept of Bayes’ Theorem. It was intuitive, and I accepted it as truth.

I wouldn’t come across the theorem, however, for another four years or so, until in a course on Communication, I came across a concept called “Hidden Markov Models”. If you were to observe a signal, and it could have come out of four different transmitters, what are the odds that it was generated by transmitter one? Once again, it was rather intuitive. And once again, I wouldn’t come across or use this theorem for a few years.

A couple of years back, I started following the blog of Columbia Statistics and Social Sciences Professor Andrew Gelman. Here, I came across the terms “Bayesian” and “non-Bayesian”. For a long time, the terms baffled me to no end. I just couldn’t get what the big deal about Bayes’ Theorem was – as far as I was concerned it was intuitive and “truth” and saw no reason to disbelieve it. However, Gelman frequently allured to this topic, and started using the term “frequentists” for non-Bayesians. It was puzzling as to why people refused to accept such an intuitive rule.

The Theory That Would Not Die is Shannon Bertsch McGrayne’s attempt to tell the history of the Bayes’ Theorem. The theorem, according to McGrayne,

survived five near-fatal blows: Bayes had shelved it; Price published it but was ignored; Laplace discovered his own version but later favored his frequency theory; frequentists virstually banned it; and the military kept it secret.

The book is about the development of the theorem and associated methods over the last two hundred and fifty years, ever since Rev. Thomas Bayes first came up with it. It talks about the controversies associated with the theorem, about people who supported, revived or opposed it; about key applications of the theorem, and about how it was frequently and for long periods virtually ostracized.

While the book is ostensibly about Bayes’s Theorem, it is also a story of how science develops, and comes to be. Bayes proposed his theorem but didn’t publish it. His friend Price put things together and published it but without any impact. Laplace independently discovered it, but later in his life moved away from it, using frequency-based methods instead. The French army revived it and used it to determine the most optimal way to fire artillery shells. But then academic statisticians shunned it and “Bayes” became a swearword in academic circles. Once again, it saw a revival at the Second World War, helping break codes and test weapons, but all this work was classified. And then it found supporters in unlikely places – biology departments, Harvard Business School and military labs, but statistics departments continued to oppose.

The above story is pretty representative of how a theory develops – initially it finds few takers. Then popularity grows, but the establishment doesn’t like it. It then finds support from unusual places. Soon, this support comes from enough places to build momentum. The establishment continues to oppose but is then bypassed. Soon everyone accepts it, but some doubters remain..

Coming back to Bayes’ Theorem – why is it controversial and why was it ostracized for long periods of time? Fundamentally it has to do with the definition of probability. According to “frequentists”, who should more correctly be called “objectivists”, probability is objective, and based on counting. Objectivists believe that probability is based on observation and data alone, and not from subjective beliefs. If you ask an objectivist, for example, the probability of rain in Bangalore tomorrow, he will be unable to give you an answer – “rain in Bangalore tomorrow” is not a repeatable event, and cannot be observed multiple times in order to build a model.

Bayesians, who should be more correctly be called “subjectivists”, on the other hand believe that probability can also come from subjective beliefs. So it is possible to infer the probability of rain in Bangalore tomorrow based on other factors – like the cloud cover in Bangalore today or today’s maximum temperature. According to subjectivists (which is the current prevailing thought), probability for one-time events is also defined, and can be inferred from other subjective factors.

Essentially, the the battle between Bayesians and frequentists is more to do with the definition of probability than with whether it makes sense to define inverse probabilities as in Bayes’ Theorem. The theorem is controversial only because the prevailing statistical establishment did not agree with the “subjectivist” definition of probability.

There are some books that I call as ‘blog-books’. These usually contain ideas that could be easily explained in a blog post, but is expanded into book length – possibly because it is easier to monetize a book-length manuscript than a blog-length one. When I first downloaded a sample of this book to my Kindle I was apprehensive that this book might also fall under that category – after all, how much can you talk about a theorem without getting too technical? However, McGrayne avoids falling into that trap. She peppers the book with interesting stories of the application of Bayes’ Theorem through the years, and also short biographical tidbits of some of the people who helped shape the theorem. Sometimes (especially towards the end) some of these examples (of applications) seem a bit laboured, but overall, the books sustains adequate interest from the reader through its length.

If I had one quibble with the book, it would be that even after the descriptions of the story of the theorem, the book talks about “Bayesian” and ‘non-Bayesian” camps, and talk about certain scientists “not doing enough to further the Bayesian cause”. For someone who is primarily interested in getting information out of data, and doesn’t care about the methods involved, it was a bit grating that scientists be graded on their “contribution to the Bayesian cause” rather than their “contribution to science”. Given the polarizing history of the theorem, however, it is perhaps not that surprising.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
by Sharon Bertsch McGrayne
SD 12.27 (Kindle edition)
360 pages (including appendices and notes)

Hedgehogs and foxes: Or, a day in the life of a quant

I must state at the outset that this post is inspired by the second chapter of Nate Silver’s book The Signal and the Noise. In that chapter, which is about election forecasting, Silver draws upon the old Russian parable of the hedgehog and the fox. According to that story, the fox knows several tricks while the hedgehog knows only one – curling up into a ball. The story ends in favour of the hedgehog, as none of the tricks of the unfocused fox can help him evade the predator.

Most political pundits, says Silver, are like hedgehogs. They have just one central idea to their punditry and they tend to analyze all issues through that. A good political forecaster, however, needs to be able to accept and process any new data inputs, and include that in his analysis. With just one technique, this can be hard to achieve and so Silver says that to be a good political forecaster one needs to be a fox. While this might lead to some contradictory statements and thus bad punditry, it leads to good forecasts. Anyway, you can know about election forecasting from Silver’s book.

The world of “quant” and “analytics” which I inhabit is again similarly full of hedgehogs. You have the statisticians, whose solution for every problem is a statistical model. They can wax eloquent about Log Likelihood Estimators but can have trouble explaining why you should use that in the first place. Then you have the banking quants (I used to be one of those), who are proficient in derivatives pricing, stochastic calculus and partial differential equations, but if you ask them why a stock price movement is generally assumed to be lognormal, they don’t have answers. Then you have the coders, who can hack, scrape and write really efficient code, but don’t know much math. And mathematicians who can come up with elegant solutions but who are divorced from reality.

While you might make a career out of falling under any of the above categories, to truly unleash your potential as a quant, you should be able to do all. You should be a fox and should know each of these  tricks. And unlike the fox in the Old Russian fairy tale, the key to being a good fox is to know what trick to use when. Let me illustrate this with an example from my work today (actual problem statement masked since it involves client information).

So there were two possible distributions that a particular data point could have come from and I had to try and analyze which of them it came from (simple Bayesian probability, you might think). However, calculating the probability wasn’t so straightforward, as it wasn’t a standard function. Then I figured I could solve the probability problem using the inclusion-exclusion principle (maths again), and wrote down a mathematical formulation for it.

Now, I was dealing with a rather large data set, so I would have to use the computer, so I turned my mathematical solution into pseudo-code. Then, I realized that the pseudo-code was recursive, and given the size of the problem I would soon run out of memory. I had to figure out a solution using dynamic programming. Then, following some more code optimization, I had the probability. And then I had to go back to do the Bayesian analysis in order to complete the solution. And then present the solution in the form of a “business solution”, with all the above mathematical jugglery being abstracted from the client.

This versatility can come in handy in other places, too. There was a problem for which I figured out that the appropriate solution involved building a classification tree. However, given the nature of the data at hand, none of the off-the-shelf classification tree algorithms for were ideal. So I simply went ahead and wrote my own code for creating such trees. Then, I figured that classification trees are in effect a greedy algorithm, and can lead to getting stuck at local optima. And so I put in a simulated annealing aspect to it.

While I may not have in depth knowledge of any of the above techniques (to gain breadth you have to sacrifice depth), that I’m aware of a wide variety of techniques means I can provide the solution that is best for the problem at hand. And as I go along, I hope to keep learning more and more techniques – even if I don’t use them, being aware of them will lead to better overall problem solving.

The Bangalore Advantage

Last night, Pinky and I had this long conversation discussing aunts and uncles and why certain aunts and uncles were “cooler” or “more modern” compared to other aunts or uncles. I put forward my theory that in every family there is one particular generation with a large generation gap, and while in families like mine or Pinky’s this large gap occurred at our generation, these “cooler” aunts’ and uncles’ families had the large gap one generation earlier. Of course, this didn’t go far in explaining why the gap was so large in that generation in the first place.

Then Pinky came up with this hypothesis backed by data that was hard to refute, and the rest of the conversation simply went in both of us trying to confirm the hypotheses. Most of these “cool” aunts and uncles, Pinky pointed out, had spent most of their growing up years in Bangalore, and this set them apart from the more traditional relatives, who spent at least a part of their teens outside the city. The correlation was impeccable, and in an effort to avoid the oldest mistake in statistics, we sought to identify reasons that might explain this difference.

While some of the more “traditional” relatives had grown up in villages, we discovered that a large number of them had actually gone to high school/college in rather large but second-tier towns of Karnataka (this includes Mysore). So the rural-urban angle was out. Of course Bangalore was so much larger than these other towns so size alone might have been enough to account for the difference, but the rather large gap in worldviews between those that grew up in Bangalore, and those that grew up in Mysore (which, then, wasn’t so much smaller), and the rather small gap between the Mysoreans and those that grew up in small towns (like Shimoga or Bhadravati) meant that this big-city hypothesis was unfounded.

We then started talking about the kind of advantages that Bangalore (specifically) offered over other towns of Karnataka, and the real reason was soon staring us in the face. Compared to any other town in Karnataka (then, and now), Bangalore was significantly more cosmopolitan. I’ve spoken on this blog before about Bangalore having been two cities (I’ve put the LJ link rather than the NED link so that you can enjoy the comments) but the important thing was that after independence and the Britishers’ flight, the two cities got combined into one big heterogeneous city.

Relatives growing up in Mysore or Shimoga typically went to college with people from large similar backgrounds. Everyone there spoke Kannada, and the dominance of Brahmins in those towns was so overwhelming that these relatives could get through their college lives hanging out solely with other people from largely similar family backgrounds. This meant there was no new “cultural education” that college offered, and the same world views that had been prevalent in these peoples’ homes while they were growing up persisted.

It was rather different for people who grew up in Bangalore. Firstly, people from East Bangalore didn’t speak Kannada (at least, not particularly fluently), which meant English was the lingua franca. More importantly, there was greater religious, casteist and cultural diversity in the classroom, which made it so much more likely for people to interact and make friends with classmates from backgrounds rather different from one’s own. Back in those days of extreme cultural conservatism, this simple exposure to other cultures was invaluable in changing one’s world view and making one more liberal.

It is in the teens that one’s cultural norms are shaped, and exposure to different cultures at that age is critical to formation of one’s world-view. In our generation, this difference has probably played out in the kind of schools one goes to. However, the distinction in conservatism (based on school/college/ area) isn’t so stark as to come up with a unified theory like the one we’ve come up here. Sticking on to the previous generation, what other reasons can you think of that makes certain aunts and uncles “cooler” than others?

The Trouble With Analyst Reports

The only time I watch CNBC is in the morning when I’m at the gym. For reasons not known to me, my floor in office lacks televisions (every other floor has them) and the last thing I want to do when I’m home is to watch TV, that too a business channel, hence the reservation for the gym. I don’t recollect what programme I was watching but there were some important looking people (they were in suits) talking and on the screen “Target 1200” flashed (TVs in my gym are muted).

Based on some past pattern recognition, I realized that the guy in the suit was peddling the said stock (he was a research analyst) and asking people to buy it. According to him, the stock price would reach 1200 (I have no clue what company this is and how much it trades for now). However, there were two important pieces of information he didn’t give me, because of which I’ll probably never take advice from him or someone else of his ilk.

Firstly, he doesn’t tell me when the stock price will reach 1200. For example, if it is 1150 today, and it is expected to reach 1200 in 12 years, I’d probably be better off putting my money in the bank, and watching it grow risk-free. Even if the current price were lower, I would want a date by which the stock is supposed to reach the target price. Good finance implies tenure matching, so I should invest accordingly. If the stock is expected to give good returns in a year, then I should put only that money into it which I would want to invest for around that much time. And so forth.

Then he doesn’t tell me how long it will stay at 1200. I’m not an active investor. I might check prices of stocks that I own maybe once in a week (I currently don’t own any stock). So it’s of no use to me if the price hits 1200 some time during some intraday trade. i would want the price to remain at 1200 or higher for a longer period so that I can get out.

Thirdly and most importantly, he doesn’t tell me anything about volatility. He doesn’t give me any statistics. He doesn’t tell me if 1200 is the expected value of the stock, or the median, or the maximum, or minimum, at whatever point of time (we’ve discussed this time bit before). He doesn’t tell me what are the chances that I’ll get that 1200 that he professes. He doesn’t tell me what I can expect out of the stock if things don’t go well. And as a quant, I refuse to touch anything that doesn’t come attached with a distribution.

Life in general becomes so much better when you realize and recognize volatility (maybe I’ll save that for another discourse). It helps you set your expectations accordingly; it helps you plan for situations you may not have thought of; most importantly it allows you to recognize the value of options (not talking about financial options here; talking of everyday life situations). And so forth.

So that is yet another reason I don’t generally watch business TV. I have absolutely no use for their stock prediction and tips. And I think you too need to take these tips and predictions with a bit of salt. And not spend a fortune buying expensive reports. Just use your head. Use common sense. Recognize volatility. And risk. And you’ll do well.