I was introduced to Bayes’ Theorem of Conditional Probabilities in a rather innocuous manner back when I was in Standard 12. KVP Raghavan, our math teacher, talked about pulling black and white balls out of three different boxes. “If you select a box at random, draw two balls and find that both are black, what is the probability you selected box one?” , he asked and explained to us the concept of Bayes’ Theorem. It was intuitive, and I accepted it as truth.
I wouldn’t come across the theorem, however, for another four years or so, until in a course on Communication, I came across a concept called “Hidden Markov Models”. If you were to observe a signal, and it could have come out of four different transmitters, what are the odds that it was generated by transmitter one? Once again, it was rather intuitive. And once again, I wouldn’t come across or use this theorem for a few years.
A couple of years back, I started following the blog of Columbia Statistics and Social Sciences Professor Andrew Gelman. Here, I came across the terms “Bayesian” and “non-Bayesian”. For a long time, the terms baffled me to no end. I just couldn’t get what the big deal about Bayes’ Theorem was – as far as I was concerned it was intuitive and “truth” and saw no reason to disbelieve it. However, Gelman frequently allured to this topic, and started using the term “frequentists” for non-Bayesians. It was puzzling as to why people refused to accept such an intuitive rule.
The Theory That Would Not Die is Shannon Bertsch McGrayne’s attempt to tell the history of the Bayes’ Theorem. The theorem, according to McGrayne,
survived five near-fatal blows: Bayes had shelved it; Price published it but was ignored; Laplace discovered his own version but later favored his frequency theory; frequentists virstually banned it; and the military kept it secret.
The book is about the development of the theorem and associated methods over the last two hundred and fifty years, ever since Rev. Thomas Bayes first came up with it. It talks about the controversies associated with the theorem, about people who supported, revived or opposed it; about key applications of the theorem, and about how it was frequently and for long periods virtually ostracized.
While the book is ostensibly about Bayes’s Theorem, it is also a story of how science develops, and comes to be. Bayes proposed his theorem but didn’t publish it. His friend Price put things together and published it but without any impact. Laplace independently discovered it, but later in his life moved away from it, using frequency-based methods instead. The French army revived it and used it to determine the most optimal way to fire artillery shells. But then academic statisticians shunned it and “Bayes” became a swearword in academic circles. Once again, it saw a revival at the Second World War, helping break codes and test weapons, but all this work was classified. And then it found supporters in unlikely places – biology departments, Harvard Business School and military labs, but statistics departments continued to oppose.
The above story is pretty representative of how a theory develops – initially it finds few takers. Then popularity grows, but the establishment doesn’t like it. It then finds support from unusual places. Soon, this support comes from enough places to build momentum. The establishment continues to oppose but is then bypassed. Soon everyone accepts it, but some doubters remain..
Coming back to Bayes’ Theorem – why is it controversial and why was it ostracized for long periods of time? Fundamentally it has to do with the definition of probability. According to “frequentists”, who should more correctly be called “objectivists”, probability is objective, and based on counting. Objectivists believe that probability is based on observation and data alone, and not from subjective beliefs. If you ask an objectivist, for example, the probability of rain in Bangalore tomorrow, he will be unable to give you an answer – “rain in Bangalore tomorrow” is not a repeatable event, and cannot be observed multiple times in order to build a model.
Bayesians, who should be more correctly be called “subjectivists”, on the other hand believe that probability can also come from subjective beliefs. So it is possible to infer the probability of rain in Bangalore tomorrow based on other factors – like the cloud cover in Bangalore today or today’s maximum temperature. According to subjectivists (which is the current prevailing thought), probability for one-time events is also defined, and can be inferred from other subjective factors.
Essentially, the the battle between Bayesians and frequentists is more to do with the definition of probability than with whether it makes sense to define inverse probabilities as in Bayes’ Theorem. The theorem is controversial only because the prevailing statistical establishment did not agree with the “subjectivist” definition of probability.
There are some books that I call as ‘blog-books’. These usually contain ideas that could be easily explained in a blog post, but is expanded into book length – possibly because it is easier to monetize a book-length manuscript than a blog-length one. When I first downloaded a sample of this book to my Kindle I was apprehensive that this book might also fall under that category – after all, how much can you talk about a theorem without getting too technical? However, McGrayne avoids falling into that trap. She peppers the book with interesting stories of the application of Bayes’ Theorem through the years, and also short biographical tidbits of some of the people who helped shape the theorem. Sometimes (especially towards the end) some of these examples (of applications) seem a bit laboured, but overall, the books sustains adequate interest from the reader through its length.
If I had one quibble with the book, it would be that even after the descriptions of the story of the theorem, the book talks about “Bayesian” and ‘non-Bayesian” camps, and talk about certain scientists “not doing enough to further the Bayesian cause”. For someone who is primarily interested in getting information out of data, and doesn’t care about the methods involved, it was a bit grating that scientists be graded on their “contribution to the Bayesian cause” rather than their “contribution to science”. Given the polarizing history of the theorem, however, it is perhaps not that surprising.