## Lessons from poker party

In the past I’ve drawn lessons from contract bridge on this blog – notably, I’d described a strategy called “queen of hearts” in order to maximise chances of winning in a game that is terribly uncertain. Now it’s been years since I played bridge, or any card game for that matter. So when I got invited for a poker party over the weekend, I jumped at the invitation.

This was only the second time ever that I’d played poker in a room – I’ve mostly played online where there are no monetary stakes and you see people go all in on every hand with weak cards. And it was a large table, with at least 10 players being involved in each hand.

A couple of pertinent observations (reasonable return for the £10 I lost that night).

Firstly a windfall can make you complacent. I’m usually a conservative player, bidding aggressively only when I know that I have good chances of winning. I haven’t played enough to have mugged up all the probabilities – that probably offers an edge to my opponents. But I have a reasonable idea of what constitutes a good hand and bid accordingly.

My big drawdown happened in the hand immediately after I’d won big. After an hour or so of bleeding money, I’d suddenly more than broken even. That meant that in my next hand, I bid a bit more aggressively than I would have for what I had. For a while I managed to stay rational (after the flop I knew I had a 1/6 chance of winning big, and having mugged up the Kelly Criterion on my way to the party, bid accordingly).

And when the turn wasn’t to my liking I should’ve just gotten out – the (approx) percentages didn’t make sense any more. But I simply kept at it, falling for the sunk cost fallacy (what I’d put in thus far in the hand). I lost some 30 chips in that one hand, of which at least 21 came at the turn and the river. Without the high of having won the previous hand, I would’ve played more rationally and lost only 9. After all the lectures I’ve given on logic, correlation-causation and the sunk cost fallacy, I’m sad I lost so badly because of the last one.

The second big insight is that poverty leads to suboptimal decisions. Now, this is a well-studied topic in economics but I got to experience it first hand during the session. This was later on in the night, as I was bleeding money (and was down to about 20 chips).

I got pocket aces (a pair of aces in hand) – something I should’ve bid aggressively with. But with the first 3 open cards falling far away from the face cards and being uncorrelated, I wasn’t sure of the total strength of my hand (mugging up probabilities would’ve helped for sure!). So when I had to put in 10 chips to stay in the hand, I baulked, and folded.

Given the play on the table thus far, it was definitely a risk worth taking, and with more in the bank, I would have. But poverty and the Kelly Criterion meant that the number of chips that I was able to invest in the arguably strong hand was limited, and that limited my opportunity to profit from the game.

It is no surprise that the rest of the night petered out for me as my funds dwindled and my ability to play diminished. Maybe I should’ve bought in more when I was down to 20 chips – but then given my ability relative to the rest of the table, that would’ve been good money after bad.

## Bayesian recognition in baby similarity

When people come to see small babies, it’s almost like they’re obliged to offer their opinions on who the child looks like. Most of the time it’s an immediate ancestor – either a parent or grandparent. Sometimes it could be a cousin or aunt or uncle as well. Thankfully it’s uncommon to compare babies’ looks to those who they don’t share genes with.

So as people have come up and offered their opinions on who our daughter looks like (I’m top seed, I must mention), I’ve been trying to analyse how they come up with their predictions. And as I observe the connections between people making the observations, and who they mention, I realise that this too follows some kind of Bayesian Recognition.

Basically different people who come to see the baby have different amounts of information on how each of the baby’s ancestors looked like. A recent friend of mine, for example, will only know how my wife and I look. An older friend might have some idea of how my parents looked. A relative might have a better judgment of how one of my parents looked than how I looked.

So based on their experiences in recognising different people in and around the baby’s immediate ancestry, they effectively start with a prior distribution of who the baby looks like. And then when they see the baby, they update their priors, and then mention the person with the highest posterior probability of matching the baby’s face and features.

Given that posterior probability is a function of prior probability, there is no surprise that different people will disagree on who the baby looks like. After all, each of their private knowledge of the baby’s ancestry’s idiosyncratic faces, and thus their priors, will be different!

Unrelated, but staying on Bayesian reasoning, I recently read this fairly stud piece in Aeon on why stereotyping is not necessarily a bad thing. The article argues that in the absence of further information, stereotypes help us form a good first prior, and that stereotypes only become a problem if we fail to update our priors with any additional information we get.

## Movie plots and low probability events

First of all I don’t watch too many movies. And nowadays, watching movies has become even harder as I try to double-guess the plot.

Fundamentally, commercial movies like to tell stories that are spectacular, which means they should consist of low-probability events. Think of defusing bombs when there is 1 second left on the timer, for example, or the heroine’s flight getting delayed just so that the hero can catch her at the airport.

Now, the entire plot of the movie cannot consist of such low-probability events, for that will make the movie extremely incredulous, and people won’t like it. Moreover, a few minutes into such a movie, the happenings won’t be low probability any more.

So the key is to intersperse high-probability events with low-probability events so that the viewer’s attention is maintained. There are many ways to do this, but as Kurt Vonnegut once wrote (in his masters thesis, no less), there are a few basic shapes that stories take. These shapes are popular methods in which high and low-probability events get interspersed so that the movie will be interesting.

So once you understand that there are certain “shapes” that stories take, you can try and guess how a movie’s plot will unfold. You make a mental note of the possible low-probability events that could happen, and with some practice, you will know how the movie will play out.

In an action movie, for example, there is a good chance that one (or more) of the “good guys” dies at the end. Usually (but not always), it is not the hero. Analysing the other characters in his entourage, it shouldn’t be normally hard to guess who will bite the dust. And when the event inevitably happens, it’s not surprising to you any more!

Similarly, in a romantic movie, unless you know that the movie belongs to a particular “type”, you know that the guy will get the girl at the end of the movie. And once you can guess that, it is not hard to guess what improbable events the movie will comprise of.

Finally, based on some of the action movies I’ve watched recently (not many, mind you, so there is a clear small samples bias here), most of their plots can be explained by one simple concept. Rather than spelling it in words, I’ll let you watch this scene from The Good, The Bad and The Ugly.

## Horses, Zebras and Bayesian reasoning

David Henderson at Econlog quotes a doctor on a rather interesting and important point, regarding Bayesian priors. He writes:

Later, when I went to see his partner, my regular doctor, to discuss something else, I mentioned that incident. He smiled and said that one of the most important lessons he learned from one of his teachers in medical school was:

When you hear hooves, think horses, not zebras.

This was after he had some symptoms that are correlated with heart attack and panicked and called his doctor, got treated for gas trouble and was absolutely fine after that.

Our problem is that when we have symptoms that are correlated with something bad, we immediately assume that it’s the bad thing that has happened, and panic. In that process we don’t consider alternate reasonings, and then do a Bayesian analysis.

Let me illustrate with a personal example. Back when I was a schoolboy, and I wouldn’t return home from school at the right time, my mother would panic. This was the time before cellphones, remember, and she would just assume that “the worst” had happened and that I was in trouble somewhere. Calls would go to my father’s office, and he would ask her to wait, though to my credit I was never so late that they had to take any further action.

Now, coming home late from school can happen due to a variety of reasons. Let us eliminate reasons such as wanting to play basketball for a while before returning – since such activities were “usual” and been budgeted for. So let’s assume that there are two possible reasons I’m late – the first is that I had gotten into trouble – I had either been knocked down on my way home or gotten kidnapped. The second is that the BTS (Bangalore Transport Service, as it was then called) schedule had gone completely awry, thanks to which I had missed my usual set of buses, and was thus delayed. Note that me not turning up at home until a certain point of time was a symptom of both of these.

Having noticed such a symptom, my mother would automatically come to the “worst case” conclusion (that I had been knocked down or kidnapped), and panic.   But then I’m not sure that was the more rational reaction. What she should have done was to do a Bayesian analysis and use that to guide her panic.

Let A be the event that I’d been knocked over or kidnapped, and B be the event that the bus schedule had gone awry. Let L(t) be the event that I haven’t gotten home till time t, and that such an event has been “observed”. The question is that, with L(t) having been observed, what are the odds of A and B having happened? Bayes Theorem gives us an answer. The equation is rather simple:

P(A | L(t) ) =  P(A).P(L(t)|A) / (P(A).P(L(t)|A) + P(B).P(L(t)|B) )

$P(B|L(t))$ is just one minus the above quantity (we assume that there is nothing else that can cause L(t)) .

So now let us give values. I’m too lazy to find the data now, but let’s say we find from the national crime data that the odds of a fifteen-year-old boy being in an accident or kidnapped on a given day is one in a million. And if that happens, then L(t) obviously gets observed. So we have

$P(A) = \frac{1}{1000000}$
$P(L(t) | A) = 1$

The BTS was notorious back in the day for its delayed and messed up schedules. So let us assume that P(B) is $\frac{1}{100}$. Now, $P(L(t)|B)$ is tricky, and the reason the (t) qualifier has been added to L. The larger t is, the smaller the value of L(t)|B. If there is a bus schedule breakdown, there is probably a 50% probability that I’m not home an hour after “usual”. But there is only a 10% probability that I’m not home two hours after “usual” because a bus breakdown happened. So

$P(L(1)|B) = 0.5$
$P(L(2)|B) = 0.1$

Now let’s plug in and based on how delayed I was, find the odds that I was knocked down/kidnapped. If I were late by an hour,
$P(A|L(1)) =$ $\frac{ \frac{1}{1000000} \ 1 }{ \frac{1}{1000000} \ 1 + \frac{1}{100} \ 0.5}$
or $P(A|L(1)) = 0.00019996$. In other words, if I didn’t get home an hour later than usual, the odds that I had been knocked down or kidnapped was just one in five thousand!

What if I didn’t come home two hours after my normal time? Again we can plug into the formula, and here we find that $P(A|L(2)) = 0.000999$ or one in a thousand! So notice that the later I am, the higher the odds that I’m in trouble. Yet, the numbers (admittedly based on the handwaving assumptions above) are small enough for us to not worry!

Bayesian reasoning has its implications elsewhere, too. There is the medical case, as Henderson’s blogpost illustrates. Then we can use this to determine whether a wrong act was due to stupidity or due to malice. And so forth.

But what Henderson’s doctor told him is truly an immortal line:

When you hear hooves, think horses, not zebras.

## Should you stop flying Malaysian?

So Malaysian Airlines faced its second tragedy in four months when its flight MH17 was shot down over Eastern Ukraine yesterday. In response to this terrorist attack, stock prices of Malaysian Airline dropped sharply in today’s trading. Given that the airline has faced two tragedies in quick succession, the question is if you should stop flying the airline, and if the price crash is justified.

The basic question we need to ask ourselves before we book our next ticket is the probability of that Malaysian flight crashing vis-a-vis the probability of a flight belonging to another airline crashing. Now, one never knows what happened to MH370, but most reports (months after the disappearance) point to either sabotage or a terrorist attack. Based on analysis and reports so far, it is extremely unlikely that MH370 disappeared on account of any technical or security lapse on behalf of the airline.

Coming to MH17, which was shot down over Ukraine, again it must be recognized that the airline went down thanks to a terrorist attack. It must also be pointed out that the terrorist attack was from the ground and not from on board, and that there is nothing to indicate that there was any technical or security lapse on the part of the airline that led to the attack.

Moreover, given that neither Malaysia nor the Netherlands (MH17 took off from Amsterdam) has anything to do with either side of the Ukraine conflict, it can be assumed that the targeting of Malaysian Airlines in yesterday’s attack was just incidental. It is more likely that the terrorists wanted to either shoot down a Russian or Ukrainian airline for a particular reason and took down a Malaysian flight by mistake, or just wanted to show their intent by shooting down some airline. Based on this, we can say with very high confidence that the reason a Malaysian airline flight was targeted last night was purely incidental.

Based on this analysis, it is unlikely that there is something specific about Malaysian Airlines that has led to the two accidents in the recent past. In this light, fear of flying Malaysian is irrational, and there is no reason to believe that a Malaysian flight is going to be less safe than a flight of another airline. So if you are flying on a route that is served by Malaysian, after accounting for cost and time and other “normal” factors of consideration, there is no reason why you should prefer to fly another airline rather than Malaysian.

And should you fly at all? If it’s a route that you would normally travel by flight, you should most definitely should, for on a passenger kilometer basis, traveling by flight is definitely safer than traveling by car.

Then what about the markets? The stock price of MH has tanked because the market believes that people are going to fly MH less. Considering that most people are irrational, this is a fair judgment to make, and so one can say that the stock price crash is justified. However, unless something untoward happens (which can actually be traced back to incompetence on behalf of MH), it is likely that MH traffic fall will be much lower than what the markets expect, so it might make sense to buy the stock today – if you have the opportunity to do so. And as a passenger, MH fares are likely to get more competitive in the near term, so you might want to take advantage of that also!

## Review: The Theory That Would Not Die

I was introduced to Bayes’ Theorem of Conditional Probabilities in a rather innocuous manner back when I was in Standard 12. KVP Raghavan, our math teacher, talked about pulling black and white balls out of three different boxes. “If you select a box at random, draw two balls and find that both are black, what is the probability you selected box one?” , he asked and explained to us the concept of Bayes’ Theorem. It was intuitive, and I accepted it as truth.

I wouldn’t come across the theorem, however, for another four years or so, until in a course on Communication, I came across a concept called “Hidden Markov Models”. If you were to observe a signal, and it could have come out of four different transmitters, what are the odds that it was generated by transmitter one? Once again, it was rather intuitive. And once again, I wouldn’t come across or use this theorem for a few years.

A couple of years back, I started following the blog of Columbia Statistics and Social Sciences Professor Andrew Gelman. Here, I came across the terms “Bayesian” and “non-Bayesian”. For a long time, the terms baffled me to no end. I just couldn’t get what the big deal about Bayes’ Theorem was – as far as I was concerned it was intuitive and “truth” and saw no reason to disbelieve it. However, Gelman frequently allured to this topic, and started using the term “frequentists” for non-Bayesians. It was puzzling as to why people refused to accept such an intuitive rule.

The Theory That Would Not Die is Shannon Bertsch McGrayne’s attempt to tell the history of the Bayes’ Theorem. The theorem, according to McGrayne,

survived five near-fatal blows: Bayes had shelved it; Price published it but was ignored; Laplace discovered his own version but later favored his frequency theory; frequentists virstually banned it; and the military kept it secret.

The book is about the development of the theorem and associated methods over the last two hundred and fifty years, ever since Rev. Thomas Bayes first came up with it. It talks about the controversies associated with the theorem, about people who supported, revived or opposed it; about key applications of the theorem, and about how it was frequently and for long periods virtually ostracized.

While the book is ostensibly about Bayes’s Theorem, it is also a story of how science develops, and comes to be. Bayes proposed his theorem but didn’t publish it. His friend Price put things together and published it but without any impact. Laplace independently discovered it, but later in his life moved away from it, using frequency-based methods instead. The French army revived it and used it to determine the most optimal way to fire artillery shells. But then academic statisticians shunned it and “Bayes” became a swearword in academic circles. Once again, it saw a revival at the Second World War, helping break codes and test weapons, but all this work was classified. And then it found supporters in unlikely places – biology departments, Harvard Business School and military labs, but statistics departments continued to oppose.

The above story is pretty representative of how a theory develops – initially it finds few takers. Then popularity grows, but the establishment doesn’t like it. It then finds support from unusual places. Soon, this support comes from enough places to build momentum. The establishment continues to oppose but is then bypassed. Soon everyone accepts it, but some doubters remain..

Coming back to Bayes’ Theorem – why is it controversial and why was it ostracized for long periods of time? Fundamentally it has to do with the definition of probability. According to “frequentists”, who should more correctly be called “objectivists”, probability is objective, and based on counting. Objectivists believe that probability is based on observation and data alone, and not from subjective beliefs. If you ask an objectivist, for example, the probability of rain in Bangalore tomorrow, he will be unable to give you an answer – “rain in Bangalore tomorrow” is not a repeatable event, and cannot be observed multiple times in order to build a model.

Bayesians, who should be more correctly be called “subjectivists”, on the other hand believe that probability can also come from subjective beliefs. So it is possible to infer the probability of rain in Bangalore tomorrow based on other factors – like the cloud cover in Bangalore today or today’s maximum temperature. According to subjectivists (which is the current prevailing thought), probability for one-time events is also defined, and can be inferred from other subjective factors.

Essentially, the the battle between Bayesians and frequentists is more to do with the definition of probability than with whether it makes sense to define inverse probabilities as in Bayes’ Theorem. The theorem is controversial only because the prevailing statistical establishment did not agree with the “subjectivist” definition of probability.

There are some books that I call as ‘blog-books’. These usually contain ideas that could be easily explained in a blog post, but is expanded into book length – possibly because it is easier to monetize a book-length manuscript than a blog-length one. When I first downloaded a sample of this book to my Kindle I was apprehensive that this book might also fall under that category – after all, how much can you talk about a theorem without getting too technical? However, McGrayne avoids falling into that trap. She peppers the book with interesting stories of the application of Bayes’ Theorem through the years, and also short biographical tidbits of some of the people who helped shape the theorem. Sometimes (especially towards the end) some of these examples (of applications) seem a bit laboured, but overall, the books sustains adequate interest from the reader through its length.

If I had one quibble with the book, it would be that even after the descriptions of the story of the theorem, the book talks about “Bayesian” and ‘non-Bayesian” camps, and talk about certain scientists “not doing enough to further the Bayesian cause”. For someone who is primarily interested in getting information out of data, and doesn’t care about the methods involved, it was a bit grating that scientists be graded on their “contribution to the Bayesian cause” rather than their “contribution to science”. Given the polarizing history of the theorem, however, it is perhaps not that surprising.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
by Sharon Bertsch McGrayne
U
SD 12.27 (Kindle edition)
360 pages (including appendices and notes)

## Religion and Probability

If only people were better at mathematics in general and probability in particular, we may not have had religion

Last month I was showing my mother-in-law the video of the meteor that fell in Russia causing much havoc, and soon the conversation drifted to why the meteor fell where it did. “It is simple mathematics that the meteor fell in Russia”, I declared, trying to show off my knowledge of geography and probability, arguing that Russia’s large landmass made it the most probable country for the meteor to fall in. My mother-in-law, however, wasn’t convinced. “It’s all god’s choice”, she said.

Recently I realized the fallacy in my argument. While it was probabilistically most likely that the meteor would fall in Russia than in any other country, there was no good scientific reason to explain why it fell at the exact place it did. It could have just as likely fallen in any other place. It was just a matter of chance that it fell where it did.

Falling meteors are not the only events in life that happen with a certain degree of randomness. There are way too many things that are beyond our control which happen when they happen and the way they happen for no good reason. And the kicker is that it all just doesn’t average out. Think about the meteor itself for example. A meteor falling is such a rare event that it is unlikely to happen (at least with this kind of impact) again in most people’s lifetimes. This can be quite confounding for most people.

Every time I’ve studied probability (be it in school or engineering college or business school), I’ve noticed that most people have much trouble understanding it. I might be generalizing based on my cohort but I don’t think it would be too much of a stretch to say that probability is not the easiest of subjects to grasp for most people. Which is a real tragedy given the amount of randomness that is a fixture in everyone’s lives.

Because of the randomness inherent in everyone’s lives, and because most of these random events don’t really average out in people’s lifetimes, people find the need to call upon an external entity to explain these events. And once the existence of one such entity is established, it is only natural to attribute every random event to the actions of this entity.

And then there is the oldest mistake in statistics – assuming that if two events happen simultaneously or one after another, one of the events is the cause for the other. (I’m writing this post while watching football) Back in 2008-09, the last time Liverpool FC presented a good challenge for the English Premier League, I noticed a pattern over a month where Liverpool won all the games that I happened to watch live (on TV) and either drew or lost the others. Being rather superstitious, I immediately came to the conclusion that my watching a game actually led to a Liverpool victory. And every time that didn’t happen (that 2-2 draw at Hull comes to mind) I would try to rationalize that by attributing it to a factor I had hitherto left out of “my model” (like I was seated on the wrong chair or that my phone was ringing when a goal went in or something).

So you have a number of events which happen the way they happen randomly, and for no particular reason. Then, you have pairs of events that for random reasons happen in conjunction with one another, and the human mind that doesn’t like un-explainable events quickly draws a conclusion that one led to the other. And then when the pattern breaks, the model gets extended in random directions.

Randomness leads you to believe in an external entity who is possibly choreographing the world. When enough of you believe in one such entity, you come up with a name for the entity, for example “God”. Then people come up with their own ways of appeasing this “God”, in the hope that it will lead to “God” choreographing events in their favour. Certain ways of appeasement happen simultaneously with events favourable to the people who appeased. These ways of appeasement are then recognized as legitimate methods to appease “God”. And everyone starts following them.

Of course, the experiment is not repeatable – for the results were purely random. So people carry out activities to appease “God” and yet experience events that are unfavourable to them. This is where model extension kicks in. Over time, certain ways of model extension have proved to be more convincing than others, the most common one (at least in India) being ‘”God” is doing this to me because he/she wants to test me”. Sometimes these model extensions also fail to convince. However, the person has so much faith in the model (it has after all been handed over to him/her by his/her ancestors, and a wrong model could definitely not have propagated?) that he/she is not willing to question the model, and tries instead to further extend it in another random direction.

In different parts of the world, different methods of appeasement to “God” happened in conjunction with events favourable to the appeasers, and so this led to different religions. Some people whose appeasements were correlated with favourable events had greater political power (or negotiation skills) than others, so the methods of appeasement favoured by the former grew dominant in that particular society. Over time, mostly due to political and military superiority, some of these methods of appeasement grew disproportionately, and others lost their way. And we had what are now known as “major religions”. I don’t need to continue this story.

So going back, it all once again boils down to the median man’s poor understanding of concepts of probability and randomness, and the desire to explain all possible events. Had human understanding of probability and randomness been superior, it is possible that religion didn’t exist at all!