Bayesian Reasoning and Indian Philosophy

I’m currently reading a book called How the World Thinks: A global history of philosophy by Julian Baggini. I must admit I bought this by mistake – I was at a bookshop where I saw this book and went to the Amazon website to check reviews. And by mistake I ended up hitting buy. And before I got around to returning it, I started reading and liking it, so I decided to keep it.

In any case, this book is a nice comparative history of world philosophies, with considerable focus on Indian, Chinese, Japanese and Islamic philosophies. The author himself is trained in European/Western philosophy, but he keeps an open mind and so far it’s been an engaging read.

Rather than approaching the topic in chronological order, like some historians might have been tempted to do, this book approaches it by concept, comparing how different philosophies treat the same concept. And the description of Indian philosophy in the “Logic” chapter caught my eye, in the sense that it reminded me of Bayesian logic, and a piece I’d written a few years back.

Talking about Hindu philosophy and logic, Baggins writes:

For instance, the Veda affirms that when the appropriate sacrifice for the sake of a son is performed, a son will be produced. But it is often observed that a son is not produced, even though the sacrifice has been performed. This would seem to be pretty conclusive proof that the sacrifices don’t work and so the Veda is flawed. Not, however, if you start from the assumption that the Veda cannot be flawed.

In other words, Hindu Philosophy starts with the Bayesian prior that the Veda cannot be flawed. Consequently, irrespective of how strong the empirical evidence that the Vedas are flawed, the belief in the Vedas can never change! On the other hand, if the prior probability that the Vedas were flawed were positive but even infinitesimal, then the amount of evidences such as the above (where sacrifices that are supposed to have produced sons but fail to do so) would over time result in the probability of the Vedas being flawed increasing, and soon tending to 1.

In 2015, I had written in Mint about how Bayesian logic can be used to explain online flame wars. There again, I had written about how when people start with extreme opinions (probabilities equal to 0 or 1), even the strongest contrary evidence is futile to get them to change their opinions. And hence in online flame wars you have people simply talking past each other because neither is willing to update their opinions in the face of evidence.

Coming back to Hindu philosophy, this prior belief that the Vedas cannot be flawed reminds me of the numerous futile arguments with some of my relatives who are of a rather religious persuasion. In each case I presented to them what seemed like strong proof that some of their assumptions of religion are flawed. In each case, irrespective of the strength of my evidence, they refused to heed my argument. Now, looking at the prior of a religious Hindu – that the likelihood of the Vedas being flawed is 0 (not infinitesimal, but 0), it is clear why my arguments fell on deaf ears.

In any case, Baggini goes on to say:

By this logic, if ‘a son is sure to be produced as a result of performing the sacrifice’ but a son is not produced, it can only follow that the sacrifice was not performed correctly, however much it seems that it was performed properly. By such argument, the Ny?ya S?tra can safely conclude, ‘Therefore there is no untruth in the Veda.’

Review: The Theory That Would Not Die

I was introduced to Bayes’ Theorem of Conditional Probabilities in a rather innocuous manner back when I was in Standard 12. KVP Raghavan, our math teacher, talked about pulling black and white balls out of three different boxes. “If you select a box at random, draw two balls and find that both are black, what is the probability you selected box one?” , he asked and explained to us the concept of Bayes’ Theorem. It was intuitive, and I accepted it as truth.

I wouldn’t come across the theorem, however, for another four years or so, until in a course on Communication, I came across a concept called “Hidden Markov Models”. If you were to observe a signal, and it could have come out of four different transmitters, what are the odds that it was generated by transmitter one? Once again, it was rather intuitive. And once again, I wouldn’t come across or use this theorem for a few years.

A couple of years back, I started following the blog of Columbia Statistics and Social Sciences Professor Andrew Gelman. Here, I came across the terms “Bayesian” and “non-Bayesian”. For a long time, the terms baffled me to no end. I just couldn’t get what the big deal about Bayes’ Theorem was – as far as I was concerned it was intuitive and “truth” and saw no reason to disbelieve it. However, Gelman frequently allured to this topic, and started using the term “frequentists” for non-Bayesians. It was puzzling as to why people refused to accept such an intuitive rule.

The Theory That Would Not Die is Shannon Bertsch McGrayne’s attempt to tell the history of the Bayes’ Theorem. The theorem, according to McGrayne,

survived five near-fatal blows: Bayes had shelved it; Price published it but was ignored; Laplace discovered his own version but later favored his frequency theory; frequentists virstually banned it; and the military kept it secret.

The book is about the development of the theorem and associated methods over the last two hundred and fifty years, ever since Rev. Thomas Bayes first came up with it. It talks about the controversies associated with the theorem, about people who supported, revived or opposed it; about key applications of the theorem, and about how it was frequently and for long periods virtually ostracized.

While the book is ostensibly about Bayes’s Theorem, it is also a story of how science develops, and comes to be. Bayes proposed his theorem but didn’t publish it. His friend Price put things together and published it but without any impact. Laplace independently discovered it, but later in his life moved away from it, using frequency-based methods instead. The French army revived it and used it to determine the most optimal way to fire artillery shells. But then academic statisticians shunned it and “Bayes” became a swearword in academic circles. Once again, it saw a revival at the Second World War, helping break codes and test weapons, but all this work was classified. And then it found supporters in unlikely places – biology departments, Harvard Business School and military labs, but statistics departments continued to oppose.

The above story is pretty representative of how a theory develops – initially it finds few takers. Then popularity grows, but the establishment doesn’t like it. It then finds support from unusual places. Soon, this support comes from enough places to build momentum. The establishment continues to oppose but is then bypassed. Soon everyone accepts it, but some doubters remain..

Coming back to Bayes’ Theorem – why is it controversial and why was it ostracized for long periods of time? Fundamentally it has to do with the definition of probability. According to “frequentists”, who should more correctly be called “objectivists”, probability is objective, and based on counting. Objectivists believe that probability is based on observation and data alone, and not from subjective beliefs. If you ask an objectivist, for example, the probability of rain in Bangalore tomorrow, he will be unable to give you an answer – “rain in Bangalore tomorrow” is not a repeatable event, and cannot be observed multiple times in order to build a model.

Bayesians, who should be more correctly be called “subjectivists”, on the other hand believe that probability can also come from subjective beliefs. So it is possible to infer the probability of rain in Bangalore tomorrow based on other factors – like the cloud cover in Bangalore today or today’s maximum temperature. According to subjectivists (which is the current prevailing thought), probability for one-time events is also defined, and can be inferred from other subjective factors.

Essentially, the the battle between Bayesians and frequentists is more to do with the definition of probability than with whether it makes sense to define inverse probabilities as in Bayes’ Theorem. The theorem is controversial only because the prevailing statistical establishment did not agree with the “subjectivist” definition of probability.

There are some books that I call as ‘blog-books’. These usually contain ideas that could be easily explained in a blog post, but is expanded into book length – possibly because it is easier to monetize a book-length manuscript than a blog-length one. When I first downloaded a sample of this book to my Kindle I was apprehensive that this book might also fall under that category – after all, how much can you talk about a theorem without getting too technical? However, McGrayne avoids falling into that trap. She peppers the book with interesting stories of the application of Bayes’ Theorem through the years, and also short biographical tidbits of some of the people who helped shape the theorem. Sometimes (especially towards the end) some of these examples (of applications) seem a bit laboured, but overall, the books sustains adequate interest from the reader through its length.

If I had one quibble with the book, it would be that even after the descriptions of the story of the theorem, the book talks about “Bayesian” and ‘non-Bayesian” camps, and talk about certain scientists “not doing enough to further the Bayesian cause”. For someone who is primarily interested in getting information out of data, and doesn’t care about the methods involved, it was a bit grating that scientists be graded on their “contribution to the Bayesian cause” rather than their “contribution to science”. Given the polarizing history of the theorem, however, it is perhaps not that surprising.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
by Sharon Bertsch McGrayne
U
SD 12.27 (Kindle edition)
360 pages (including appendices and notes)

The Eighty-Twenty Rule

I first got this idea during some assignment submission at IIT. One guy in our class, known to be a perfectionist is supposed to have put in 250 hours of effort into a certain course project. He is known to have got 20 out of 20 in this project. I put in about 25 hours of effort into the same project and got 17. Reasonable value for effort, I thought. And that was when I realized the law of diminishing returns to effort. And that was the philosophy I carried along for the rest of my academic life (the following four years).

The problem with working life as opposed to academic life is that the eighty-twenty formula doesn’t work. The biggest problem here is that you are working for someone else, while you were essentially working for yourself while you wree a student. Eighty was acceptable back then, it is not acceptable now. Even if you are working for yourself, the problem is that the completion-rewards curve is completely diffferent now.

Imagine a curve with the percentage of work done the X axis and the “reward” on the Y axis. In an academic setting, it is usually linear. Doing 80% of the work means that you are likely to get 80%. Fantastic. The problem wiht work is that the straight line gets replaced by a convex curve. So even to get an 80% reward, you will need to maybe do 99% of the work. The curve moves up sharply towards the end so as to give 100% reward for 100% work (note that I’m talking about work done here, not effort. Effort is irrelevant)

Now, why did I cap reward at 100% in the previous paragraph? Why did I assume that there is a “maximum” amount of wokr that can be done? Note that if there is a ceiling to the amount of work to be done, and to the reward, then you are looking at a payoff like a bond – the upside is limited – 100% but the downside is unlimited (yeah I know it’s limited at 0, but it is so far away from 100% that it can be assumed to be infinitely far away). Trying hard, doing your best each time, the best you do is 100%. But slip up a bit, and you will get big deficits. It is like the issuer of the bond defaulting.

Almost thirty years back, Michael Milken noticed this skewed payoff structure for bonds, and this led him to invent “junk bonds”, which are now more politely known as “high yield debt”. Now, these bonds were structured (basically high leverage) such that a reasonably high rate of default was built in. In an ordinary bond the “default expectation” is that the bond won’t default at all. For a high-yield bond, the “default expectation of default” is much higher than 0 – so there is a definite upside if the bond doesn’t default. So that balances the payoffs.

So how does that translate to work situations? You need to basically get yourself a job where there is significant scope for doing “something extra”. So that if you take into account the “something extra”, the “expectation” will be say something like 90% of the work. So by doing only a bit more than your old 80-20 rule from college, you can fulfil expectations. And occasionally even beat them, resulting in a major positive payoff (either in terms of money or reputation or power etc.).

The deal is that when the expectation is lower than 100%, the reward-work curve changes. It remains heavily convex for the duration within the expectation (so if expectation is 90% of work for 80% of profit, the curve will be highly convex in the {(0,90),(0,80)} area). And beyond this, it gets less convex and closer to linearity, and so gives you a bit more freedom.

I’m too lazy to draw the curves so you’ll have to imagine them in your heads. And you can find some info on convex curves here: http://en.wikipedia.org/wiki/Convex_function