What’s in a shirt number?

There is a traditional way of allotting shirt numbers to football players. “Back to front, right to left”, goes the rule. The goalkeeper is thus number 1. Irrespective of the system used, the right back is number 2, and usually the left forward/winger is number 11.

Now, the way different teams allot numbers depends upon their historical formations, and how their current formations have evolved from those historical formations. The two historical formations that are the 2-3-5 (mostly played in Europe) and the W-M (which originated in South America).

You can read Jonathan Wilson’s excellent Inverting the Pyramid to find more about how formations evolved. This post, however, is about shirt numbers in the ongoing world cup.

Now, given the way numbering has evolved in different countries, each number (between 1 and 11) has a traditional set of roles involved. 1 is the goalie everywhere, 2 is right back everywhere. 3 is left back in Europe, but right centre back in South America. 4 is central midfielder in England, but centre back in Spain and South America. 6 is a central defender in England, left back in Brazil/Argentina and a midfielder in Spain.

These are essentially numbering conventions based on how numbering systems have evolved, but are seldom a rule. However, such conventions are so ingrained in the traditional football watcher’s mind that when a player wears a shirt number that is not normally associated with his position, it appears “wrong”.

For example, William Gallas, a centre back (and occasional right back for Chelsea) by trade moved to Arsenal in 2006 and promptly got the number 10 shirt, which is usually reserved for a central attacker/attacking midfielder (in fact, the number now defines the role – it is simply called the “number 10 role”). In the last season, West Ham used two successive left backs (Razvan Rat and Pablo Armero) last season, and both were allotted number 8 – traditionally allocated to a central midfielder.

In this post, we will look at the squads of the ongoing world cup and try and understand how many players are wearing “wrong” shirt numbers. In order to do this, we look at the most common roles associated with a particular number, and identify any players that don’t fit this convention.

Figures 1 and 2 have the summary of the distribution of roles according to shirt number.

shirt1

 

shirt2

As we can see, all number 1s are goalkeepers (perhaps there is a FIFA rule to this effect). Most number 2s and 3s are defenders, but there is the odd midfielder and forward also who wears this. Iranian forward Khosro Heydari wears 2, as do Greek midfielder Ioannais Maniatis and Bosnian  midfielder Avdija Vrsajevic.

The most unnatural number 3 (in his defence, he’s always worn 3) is Ghanaian striker Asamoah Gyan. Iranian midfielder Ehsan Aji Safi also wears a 3, contrary to convention.

As discussed earlier, midfielders from a few countries wear 4, but there are also two forwards who wear that number – Japanese Keisuke Honda and Australian Tim Cahill. This can be explained by the fact that both of them started off as midfielders, and then turned into forwards, but perhaps wanted to keep their original numbers.

5 is split entirely between defenders and midfielders, who also make up for most of the number 6s. The one exception to this is Russia’s Maksim Kanunnikov, who is a forward. Interestingly, as many as six number 7s (associated with a right winger in both 2-3-5 and W-M systems) are listed as defenders! This includes Colombia’s left back Armero who notoriously wore 8 for West Ham last year. This might possibly be explained by players who started off as wingers and then moved back, but kept their numbers. Two defenders – Costa Rica’s Heiner Mora and Australia’s Bailey Wright wear number 8.

Number 9 is again one of those numbers which is associated with a specific role – a centre forward. In fact, in recent times, there is a variation of this called the “false nine” (there is also a “false ten” now). We would thus expect that all number nines are number nines, but a few midfielders also get that number. Prominent among those is Newcastle’s Cheick Tiote, who wears 9 for Cote D’Ivoire.

10 is split between midfielders and forwards (as expected), but a few defenders wear 11. Croatian captain and right back Darijo Srna wears 11, as also does Greek defender Loukas Vyntra.

Beyond 11, there is no real convention in terms of shirt numbering. The only interesting thing is in the numbers allotted to the reserve goalkeepers (notice that no goalies take any number between 2 and 11). By far, 12 is the most popular number allotted to the reserve goalkeeper, but some teams use 13 as well. Then, 22 and 23 are also pretty popular numbers for goalkeepers.

Finally, we saw that Iran was the culprit in allocating numbers 2 and 3 to non-defenders. Greece, too, came up as a repeat offender in terms of allocating inappropriate numbers. Can we build a “number convention index” and see which countries deviated most from the numbering conventions?

Now, there are degrees in being unconventional, and these need to be accommodated into the analysis. For example, a midfielder wearing 4 (there are 6 of them) is pretty normal, but a forward wearing 3 is simply plain wrong. A forward wearing 8 is not “correct”, but not “wrong” either – this shows that we need more than a simple binary scoring system.

What we will do is to first identify the most common player type for each number, and every such player will get a score of 1. For every other player wearing that number, the score will be the number of such players wearing that number divided by the number of players wearing that number who occupy the most popular position for that number.

I’m assuming the last paragraph didn’t make sense so let me use an example. To use number 2, the most popular position for a number 2 is in defence, so every defender who wears 2 gets 1 point. There are two defenders who wear 2, compared to 29 defenders who wear 2. Thus, each defender who wears 2 gets 2/29 points. One forward wears 2, and he gets 1/29 point.

Taking number 10, the most common position for the number is forward (there are 17 of them), and they all get 1 point. The remaining 15 players who wear 10 are all midfielders, and they get 15/17 points (notice this is not so much less than 1).

This way, each member of each squad gets allotted points based on how “normal” his shirt number is given his position. Summing up the points across players of a team, we get a team score on how “natural” the shirt numbers are. The maximum score a team can get is 23 (each player wearing a number appropriate for his position).

Table 3 here has the team-wise information on correctness of shirt numbers. The team with the worst allocated shirt numbers happens to be Nigeria with 16.13. At the other end, the team that has allocated numbers most appropriately is Ecuador, with 21.

Country  Score
Nigeria         16.13
Costa Rica         16.85
Greece         17.22
Australia         17.45
Iran         17.69
Ivory Coast         17.74
Colombia         18.19
Cameroon         18.19
Argentina         18.29
USA         18.41
Italy         18.63
Portugal         18.66
Algeria         18.68
Honduras         18.69
France         19.02
Ghana         19.06
Croatia         19.08
Netherlands         19.20
Mexico         19.23
Brazil         19.28
Chile         19.37
Japan         19.52
South Korea         19.69
England         19.77
Russia         20.03
Switzerland         20.04
Uruguay         20.25
Spain         20.27
Bosnia & Herzegovina         20.37
Belgium         20.83
Germany         20.86
Ecuador         21.01

 

This, however, may not tell the complete story. As we saw earlier, conventions regarding numbers between 12 and 23 are not as strict, and thus these numbers can get allocated in a more random fashion compared to 1-11. There are absolutely no taboos related to numbers 12-23, and thus, misallocating them is less of a crime than misallocating 1-11.

Hence, we will look at the numbers 1 to 11, and see how teams have performed. Table 4 has this information:

Country  Score
Australia           7.45
USA           8.04
Iran           8.18
Greece           8.59
Nigeria           8.63
Ivory Coast           8.71
Costa Rica           8.84
Ghana           8.95
Croatia           8.96
Japan           9.06
Colombia           9.27
Brazil           9.29
Spain           9.66
Bosnia & Herzegovina           9.67
Uruguay           9.70
Honduras           9.71
Portugal           9.78
Italy           9.81
South Korea           9.83
Cameroon           9.83
Chile           9.87
Russia           9.94
England           9.97
Argentina         10.03
Switzerland         10.18
Netherlands         10.24
Algeria         10.30
Ecuador         10.34
Mexico         10.35
France         10.53
Belgium         10.88
Germany         11.00

 

Germany has a “perfect” first eleven, in terms of number allocation. Belgium comes close. At the other end of the scale, we have Australia, which seems to have the most misallocated 1-11 shirt numbers. Iran and Greece, which we anecdotally saw as having high misallocations are at three and four, with the United States at 2.

Note: The data is taken from the Guardian Data Blog. Now, this analysis should be taken with some salt since in the modern game, the division of players into “defender”, “midfielder” and “forward” is not straightforward. Where would you put a “classic number ten”? What about a wing back? And so forth.

Money can buy me Premier League performance

The following graph plots the premier league performance (in terms of points) for the 2012-13 season as a function of the team’s wage bill. Apart from a few outliers here and there the correlation is astounding:

wageperformance

 

The red line is the line of best fit (according to a linear regression) and comparing team standings with respect to the line shows how well teams performed relative to what their wages would predict.

It is interesting to see that Manchester City almost fall off the charts in terms of wages, yet they could not translate this to on-pitch performance. It can also be seen that Manchester United, Spurs and Everton significantly over-performed given their wage bills.

Based on the wage bill, it would have also been reasonably easy to predict that Wigan Athletic and Reading would get relegated at the end of the season – though it must be mentioned they underperformed their wage bills, but QPR should have done a lot better given the size of their pay packet.

A simple linear regression of points against wage bill shows that every GBP 4 million increase in the wage bill leads to one additional point in the premier league! And the regression has an R-square of 69% – which means that the team’s wage bill can predict 69% of the variation in the team’s performance! Which is extremely significant.

The screenshot of the regression is given below: wagerank

 

Note that in this post we only use the wage bill and not any transfer fees paid. However, the assumption is that the two are reasonably correlated and we are not losing out on any information by using only the wage bill.

 

 

Liverpool FC, this season

For a Liverpool fan, this has easily been the best footballing season since 2008-09. Based on the performance so far, however, I would still rate the 2008-09 performance higher – primarily because Liverpool came back to win several games that season – something they’ve not managed this season. Here are some pertinent observations from the season so far:

  • Aly Cissokho is the new Djimi Traore (for those who don’t remember, he was Liverpool’s left back in the Champions League winning team in 2005. He’s been branded as ‘the worst player ever to win the Champions League’. Among other things he played Crespo onside twice for Milan’s second and third goals in that game)
  • Liverpool against Aston Villa two weekends back reminded me of Liverpool versus Milan in 2005. Back then, Rafa Benitez had dropped holding midfielder Dietmar Hamann and played Xabi Alonso and Steven Gerrard as central midfielders, and they got badly overrun.

    Here, Brendan Rodgers went with a midfield of Jordan Henderson (the new Gerrard, more on that later) and Gerrard (now a wannabe Alonso), and they got similarly overrun. The only time Liverpool looked threatening was when Lucas Leiva was on the pitch for 20 minutes of the second half

  • When Kenny Dalglish bought Henderson in 2011, it seemed like the Liverpool team had too many “Gerrards”. There was Gerrard himself, there was Alberto Aquilani (remember?) and there was Raul Meireles (yet another player in the traditional “Gerrard role”) when Henderson came in. And Jonjo Shelvey was coming up the ranks.

    Two and a half years hence, Henderson has established himself as the Number One Gerrard, ahead of Gerrard himself, who now plays more like the 4 he wears for his country than the 8 he wears for his club. Meireles and Aquilani were sold soon after Henderson arrived, Shelvey went last season (a mistake IMHO. He should’ve been loaned out) and Gerrard has moved back.

  • With Liverpool gifting West Brom a goal after not playing out properly from the back, one of the two monkeys on Liverpool’s back has bitten.  Simon Mignolet is nowhere as good as Pepe Reina as a distributor (though he’s much much better as a shot stopper), and the Toure-Skrtel partnership has always looked vulnerable playing out from the back. This was bound to happen and it’s good it happened. They’ll be more careful playing out from the back henceforth.
  • The other monkey on Liverpool’s back waiting to bite is Skrtel at set pieces. His natural strategy this season has been to grab the opponent’s tallest player. So far referees have overlooked it, and a penalty is waiting to be conceded. Hope that happens such that Liverpool don’t drop points on account of it
  • A big issue with Aly Cissokho at left back is that when he ventures forward (typically with little success), he doesn’t track back quickly enough and leaves Liverpool short of support in case the opponent breaks on a counterattack. Hence in the game against West Brom it was pleasing to see Daniel Sturridge having moved back into a left back position to cover when Cissokho got isolated on one of his ventures forward.
  • Once Jon Flanagan is fit enough to last 90 minutes (he isn’t yet, it seems), Cissokho should be dropped, Flanagan should go to left back and Kelly should play at right back. Cissokho is an abomination.
  • Liverpool’s injury list currently reads: Right back: Glen Johnson, Centre backs: Mamadou Sakho and Daniel Agger, Left back: Jose Enrique, Holding midfield: Lucas Leiva.  Another central midfielder Joe Allen recently came off that list. Gerrard, Sturridge and Coutinho have also been injured at some point in time this season.
  • The most joyous thing about watching Liverpool in 2008-09 was their comebacks. They came back from a goal down to beat Manchester United 2-1 at Old Trafford (I still remember that Ryan Babel strike that settled that game). Then came back from 2-0 down to beat Manchester City 3-2, and repeated that effort against Wigan. They almost repeated it against Hull but could only draw 2-2. Apart from the Villa game, such comebacks have been absent this season. And Liverpool have let leads slip way too many times.
  • I’m not saying anything about the Suarez-Sturridge partnership up front – the results are there to see. One thing I’ll say, though, is that I don’t like the “SAS” acronym – simply because the “A” stands for “and”. Now if only Iago Aspas could magically improve next season and become the A in SAS..
  • I have this tracker going all season that tries to predict where Liverpool will end up. This is based on quality of opposition faced. Liverpool have been a consistent fifth according to this tracker. Look at the MS Score here.

The Problem With American Sport

There was a basketball epidemic when I was in high school. It was probably a result of two things – we used to play basketball regularly in school, and Star Sports (or was it still Prime Sports?) had started showing live games from the NBA. Everyone in school would talk about basketball. Your knowledge of basketball went beyond the Magic Johnsons and Michael Jordans. You learnt about teams with wonderful names such as “Utah Jazz”. And for reasons completely unknown to me, despite having never watched him play (I still haven’t) Patrick Ewing became my favourite player.

So one morning I decided to see what the fuss about NBA was all about, and watch a game. It made for horrible viewing. There were great plays, of course. It was a great spectator sport in that sense. But what annoyed me endlessly were the time outs and consequent advertising breaks. Just when I would get settled into the rhythm of the game, someone would call a time out and for the mid 90s, two minutes of advertising was a really long time!

I still continued to watch, for “pseud value”, so that I could talk about it in school. However, I could never get the kind of engagement that I could get with cricket (then) or football (now). The game was simply way too discontinuous. A game of basketball is supposed to last 40 minutes, but these things would last three times as long. I don’t think I watched more than 2-3 games.

As everyone on my facebook timeline talks about the Super Bowl, the only thing I can think of is how unwatchable American Sport is. I understand that you need the ads to fund the game, and that greater advertising revenue means greater revenue for players and hence greater quality of sport. What irks me however, is that these ads end up causing much discontinuity in the sport.

So this morning I was thinking about why I get irked so much about ads in American sports (basketball, american football, etc.) while I can still watch cricket, which has a fair share of ads. The answer lies in randomness. I know when a cricket telecast will switch to ads – at the end of every over, at the fall of a wicket, or in an innings break. When an advertisement comes in a cricket broadcast, I’m prepared for it (except of course, when greedy broadcasters cut to ads before the full over is bowled). It is a similar case in tennis, where I expect to switch to advertisements after every two games – there is a rhythm to it.

In American sport, it is not so. That teams can call for a timeout at any point in time, and that can completely put you off. The game cuts to advertisements at moments when you least expect it, and that can be a huge challenge for someone not used to it!

A year or so ago, I had attended this lecture on sports analytics in Bangalore, delivered by a University of Chicago professor. He said that the reason football hasn’t taken off in the US is because it is not television friendly. “Split a game into four quarters, introduce two time outs in each quarter, and you will see Major League Soccer taking off”, he said. The problem, however, is that this would simply ruin the continuity of the game – which is what a lot of people love about football. And looking at the funding of the clubs in the major European leagues, it is clear that football is making sufficient money from television in its current form, without any gimmickry.

An American colleague at my last job offered another perspective. “How can you watch a game continuously for 45 minutes”, he asked. “We are so used to breaks in play every few minutes that we can’t watch continuously for so long”. If I can extrapolate from this one data point and take it with conjunction with what the professor said, you know why football is not popular in the US.

When I woke up this morning I wanted to check if the Super Bowl was being telecast in India. Then I remembered my earlier experiences of trying to watch American football, and decided against it. It is too discrete a game for my liking. There are too many breaks in play. I’d any day watch rugby instead! It is a similar game but so much more elegant and continuous!

Trading and liquidity

Every time there is some activity in the football transfer market, you are likely to hear one of two things. Either a particular player was “a steal” or the buyer “overpaid”. You seldom hear that a player was bought or sold at a “fair price”. What drives this?

Note that the issue is not perception – if you look at the transfer dealings, you are likely to find that the general opinion of whether the transfer fee was too high or too low is in most cases fairly accurate. Even if it is not accurate at the time of the transfer, it gets borne out in the subsequent year or two after sale.

Two weeks back I took a class in introductory economics for a bunch of people who hope to get elected to the Bangalore Municipal Council (BBMP). Teaching them about demand and supply, and trade, I mentioned that in any voluntary trade, both the buyer and the seller are “winners”. For example, if Liverpool sold Fernando Torres to Chelsea for GBP 50 million, it means two things: One, the value that Liverpool placed on the future contribution of Torres to the club was less than GBP 50 million. Two, the value that Chelsea placed on the future contribution of Torres was more than GBP 50 million. If either of the above conditions were not true, the deal would not have happened.

So why is it that football transfers usually end up costing too much or too little? The answer lies in “liquidity”. Liquidity is a concept that is normally used in financial markets as a measure of the depth of the market. It measures how many people are willing to buy and sell a particular commodity at a particular point in time. The theory is that the greater the number of buyers and sellers for a particular commodity, the better is the price discovery. I’ve said this several times before – it is unfortunate that the concept of liquidity doesn’t find as much traction in mainstream economics literature.

Coming back to football – why is it that players are typically either undervalued or over valued? Because players are unique, and that makes the market illiquid. Let us go back to the deal that took Torres to Chelsea. Let us say that the value Chelsea placed on his future services was GBP 50 million, and the value that Liverpool placed on his future services was GBP 35 million (numbers pulled out of thin air). Given that Liverpool owned him, this deal could have taken place at any value between these two numbers (note that at any price between 35 and 50 million, both Liverpool and Chelsea would be willing to trade)! So why did the deal take place at one end of the spectrum?

It was a consequence of how badly the two clubs wanted to do the deal. While Torres had lost form and hadn’t been performing in the 2010-11 season, Liverpool were quite happy holding on to him – they were not desperate to do the deal. Even when offered an amount higher than their valuation of the player, they sensed Chelsea’s desperation in doing the deal. So Liverpool’s game here was to hold on long enough until they knew Chelsea had bid an amount they were unlikely to improve on, and then they sold.

Sometimes fans like to sing something like “there is only one Fernando Torres” (typically when he scores). And that is the precise reason that Liverpool was able to get a premium on his sale. There was a certain kind of player whom Chelsea desperately wanted to buy, and Torres was the one who fit the bill perfectly. Given the lack of comparables, and the desperation of the buyer, it became a seller’s market and Liverpool were able to profit from it.

So we have seen here that when the buyer is more desperate to do the deal than the seller, the deal takes place at the higher end of the “value spectrum” (I just made up that phrase at this moment). It can go the other way also. When Liverpool sold Torres, they (rather unwisely) invested most of it buying a player called Andy Carroll from Newcastle United. Carroll turned out to be a dud – he was increasingly injury prone, and when a new manager Brendan Rodgers came in, he found him to be not suitable for the style of football Liverpool wanted to play.

The presence of Carroll in the squad, however, would put pressure on the manager to play him – largely a consequence of the fee that had been paid to purchase him. To this end, Rodgers decided that it was better to cut his losses and remove Carroll from the squad, rather than play a suboptimal brand of football just so that Carroll was played. Rodgers correctly decided that the money that had been spent in buying Carroll was a “sunk cost”.

Now, in his year and a half since his arrival at Liverpool, Carroll had done much to convince people that he was overvalued. His injuries and lack of form meant that clubs were unwilling to value him highly, and given Liverpool’s determination to sell, it was a seller’s market. The GBP 15 million that Liverpool extracted from West Ham for the sale was perhaps exactly the value that Liverpool had placed on Carroll.

To summarize – you sell if the price is higher than your valuation. You buy if the price is lower than your valuation. The buyer’s and seller’s valuations together determine the “value spectrum” along which a sale can be done. Presence of comparable commodities means that people can go for substitutes, and so that shrinks the value spectrum. In case of footballers with few comparables, there are no factors compressing the value spectrum, and the full extent of it is available.

In a large number of cases, one of the buyer and seller is much more desperate to do a particular deal than the other. And that pushes the price of the deal to one of the edges of the value spectrum. Hence people end up either significantly underpaying or significantly overpaying for footballers.

Premier League Sub-Tables

Ahead of the Chelsea-Liverpool game on Sunday, the pre-match show showed a “sub-table” of the premier league of how the top 8 teams had fared against each other. While by definition, the Premier League is played among 20 teams, and your result against Chelsea is as important as that against Crystal Palace, looking at sub-leagues like this one can help us gauge what the overall points table is trying to hide.

In this post I’ll just show the sub-league of the top of the league (top 3 to top 7) and bottom of the league (bottom 3 to bottom 7). Offered without further comment.

topbottom7

Premier League – Home and Away

So we are halfway through what is easily the most competitive English Premier League in recent times. To illustrate, Liverpool were on top of the table at Christmas, and after two successive defeats, lie fifth at the New Year. Anything can happen this season and the top seven or eight teams are all still in contention.

One interesting factor this season, however, has been the fixture list. For example, only Manchester United among the top eight has played at Anfield (Liverpool’s home ground) this season. Liverpool has played every other top eight team away so far – which means they will face them all at home in the second half of the season.

On the other hand, Manchester City has hosted all top eight teams bar Chelsea so far! Which means they will be playing all these teams away this season! Sunderland is currently bottom. However, they have hosted mostly top-half teams (which are significantly superior to them) and played away to other bottom half teams which are of comparable strength.

The following table illustrates who has hosted whom. The table is to be read row-wise. “H” and a red cell means that the team in the corresponding row has hosted the team in the corresponding column. A while cell and “A” implies otherwise.Notice that this is the only point of time in the season when we can do this analysis for now everyone has played everyone else exactly once. Teams in this table are ordered in descending order of points.

plhomeaway

 

 

 

Can we have a metric of who has had the best set of home fixtures this season so far? For this, let us define a “home index”. For each team, this is calculated as the difference between the average points of the teams played at home and the average points of the teams played away.

Let me illustrate. Arsenal, for example have so far hosted Chelsea, Everton, Liverpool, Spurs, Southampton, Hull, Stoke, Villa and Norwich. These teams have an average of 28.6 points as of now. Arsenal has so far visited the rest of the clubs, viz. Manchester City, Manchester United, Newcastle, Swansea, Cardiff, West Brom, Crystal Palace, Fulham, West Ham and Sunderland. These teams average 22.6 points as of now. So Arsenal’s home index is 28.6 – 22.6 = 6.0

A positive home index implies a team has played more strong teams at home and weak teams away. It is not easy to say, however, whether this implies an easier second half of the season. Arsenal, for example, would be happy to host the weaker teams they have traveled to, but the fixture list means they will be traveling to more strong teams, which means the potential for dropping points is higher.

Among the relegation-threatened teams, Sunderland has the highest “home index”, and for them the season is likely to become better – in the second half, they will get to host the weaker teams whom they can reasonably expect to beat while traveling to stronger teams (who they’ve lost to anyway) won’t change much. Thus, a high home index is a positive for lower-ranked clubs.

The following graph shows the home-away index for all clubs:

haindexNotice that both Arsenal and Manchester City have had an easier run so far – hosting the better teams. it will be interesting to see how they perform in the second half of the season when they travel to the better clubs. Especially given that Liverpool and Chelsea have had a bad run of fixtures in the first half and are likely to improve in the second half.

Finally, the first part of the post assumes that teams are better off playing at home rather than away. Is this really true? To check this, let us look at the average points scored by teams in home and away games. This number should be taken with a pinch of salt, though – teams with a high home index are less likely to perform significantly better at home compared to away.

Based on the performance so far, the average points in home games is 0.28 more than the average points in an away game. However, we also need to take into account the home index. When we regress the difference between home and away points against the home index, we find that for every one point higher in home index, the difference between home and away performance comes down by 0.04 (the R-square, for those that are interested in such things, is 15%).

The following table shows the difference in home and away performances of different teams:

homeaway2Most teams, you can see from this table, have done better home than away. The significant exceptions are Aston Villa, Manchester United and Spurs. Villa and Manchester United have a high Home Index which possibly explains this. There is no explanation of spurs’ home form, though.

 

 

 

 

 

Why I became a Liverpool fan

In mid-April 2005, I was on the District Line train from Mansion House to South Kensington, in London, and in the Victoria station, a huge number of people got on to the train. They were all dressed in red, and carrying Liverpool scarves and cans of Carlsberg beer. They were on their way to Stamford Bridge, to watch Liverpool take on Chelski in the Champions League semis at Stamford Bridge. And they started singing. 

It was magical, as they first sang “you never walk alone”, and then followed it up with personalized songs for each of the players, and for the coach Rafa Benitez. I remember one going “Steve Gerrard Gerrard, pass the ball forty yards .. ” . And another, to the tune of “La Bamba”, going “Rarararararafa Benitez, Xabi Alonso, Garcia and Nunez” (honouring all the Spaniards in the team). I was sold.
Till then, I hadn’t been much of a football fan, though I would watch the odd World Cup or Euro game. I had never really followed club football, and never supported any team. That day, things changed. I went to a crowded pub in Kensington to watch the game, perhaps I was the only Red fan there. I got to know the names of the Liverpool players (I’d heard of Gerrard and Milan Baros thanks to their exploits in Euro ’04, and I knew Alonso, Garcia and Nunez (never saw him play) thanks to the song). And quietly cheered for Liverpool in that semi final.
It has been a roller coaster ride for the last eight odd years, with more downs than ups. The undoubted high came just a month after I’d declared myself a Liverpool fan, when they came back from 0-3 down to beat Milan in the Champions League finals in Istanbul. There have been several low points, the one that hurts the most is them failing to win the Premiership in 2008-09, when they came a close second. And then, they were to sell Xabi Alonso, who had been my favourite player.
The kind of passion I feel when I watch Liverpool play is unmatched, even by what I feel when I watch the Indian Test cricket team. There is a kind of tension that develops that I seldom feel otherwise. The disappointment when they lose (or fail to win) is the kind that I normally reserve for personal debacles.
And to think it all started with a random train ride with a bunch of loud drunks.

Identifying Groups of Death

The Guardian has an interesting set of graphics trying to identify the “Group of Death” at the forthcoming (2014) football World Cup. They have basically ordered groups and teams on three counts – something called as “strength of schedule” (how it is calculated is not explained), average strength of each group (mean rating points) and the strength of each match (sum total of rating points of the teams playing). They don’t actually go on to identify which the groups of death are. 

Another piece in the same paper gives a history of the concept of the Group of Death, and tries to explain why some groups can be classified so while others cannot. So in this post we will focus on precisely that – once a draw has been made, how do we identify groups of death? Without loss of generality, let us restrict our analysis to groups of four teams from which two qualify for the next round following a round robin (the format the World Cup uses). We will also restrict ourselves to analyzing the group stage and ignore chances of “death” in the knockout stages.

A “group of death” traditionally refers to a group where at least one “favourite”  team gets knocked out. Assuming that a team with higher odds of winning the tournament is likely to beat one with lower odds, a group of death is necessarily one that contains at least three teams that are “favourites” to win the tournament.

From this, one way to measure groups of death is to order teams in decreasing order of odds of winning based on a reputed bookie’s odds, and then see how closely the top three teams of a group are clustered. The closer three teams are to each other, the closer the group is. We can use a distance metric to measure this.

Another simpler method is to see the odds of the third team in a group winning the world cup. The groups where the third best odds of winning are the groups of death! Again this is a relative metric since if each group has two “strong teams” and two “weak teams” there is effectively no group of death (hence the earlier metric trumps this one).

Another way to identify how deadly the groups are is to use bilateral odds for each match, and to identify the odds that the two “seeded teams” in a group both don’t qualify. For example, Group B has Spain, Netherlands, Australia and Chile, with the first two being the “seeded teams” (given their ranking). Now, we can calculate the probability that at least one of Spain and Netherlands doesn’t qualify. That gives the “death rating” for this group. The group for which this “death rating” is highest is the group of death.

As you can see, there are several ways for identifying the group of death. Unfortunately, none of the analysis that the Guardian has put out contributes to this. Let us now look at a couple of methods for ourselves. For the purpose of analysis I’m using the easiest available odds, which are from the Bleacher Report. Ideally, for this analysis we should be using odds before the draw was made – since the draw itself would have ended up adjusting odds. Nevertheless, since this is for illustrative (rather than predictive) purposes only, we’ll stick to the current odds.

Let us start with the easiest method, which is the odds of victory of the third best team in the group. Based on the Bleacher odds, the third best teams in each group are likely to be :

A Mexico 150/1
B Chile 33/1
C Japan 150/1
D England 28/1
E Ecuador 150/1
F Nigeria 250/1
G United States 150/1
H South Korea 150/1

Two teams stand out – Chile at 33/1 and England at 28/1. Based on this metric, the group of death is Group D (Italy, England, Uruguay, Costa Rica). The Guardian might say that Australia, Ghana or the United States might have the toughest draw, but the odds of each of them winning is so low that it doesn’t matter that they have tough draws!

Let us now use another metric – the difference between the odds of the second and third placed teams in each group. One metric of the group of death might be where this difference is the minimum (this metric has the problem of classifying groups with one clear winner as groups of death, while they technically are not).

And this metric identifies Group F (Bosnia, Nigeria)  and Group A (Croatia, Mexico) as groups of death. You might notice that these are Argentina and Brazil’s groups respectively and those two teams are expected to sail through, so this is not a good metric.

Next, let us involve the top three teams of each group (to prevent the above anomaly) and look at the sum of the absolute difference in odds. For example, if the odds of the top three teams in a group are o1, o2, o3, we will measure each group by (|o1-o2| + |o2-o3| + |o3-o1|). The smaller this sum is, the more likely a group is a “group of death”.

The results from this metric are below:

A 42% Brazil, Croatia, Mexico
B 16% Spain, Netherlands, Chile
C 7% Colombia, Japan, Greece
D 0.7% Uruguay, Italy, England
E 7% Switzerland, France, Ecuador
F 27% Argentina, Bosnia, Nigeria
G 23% Germany, Portugal, United States
H 10% Belgium, Russia, South Korea.

From this metric, it is absolutely clear which the most competitive group is – it is group D, with Uruguay, Italy and England. Based on this metric, it is unambiguous that Group D is the group of death. Groups C and E come next according to this measure, followed by Group H.

Goodhart’s Law and getting beaten on the near post

I would have loved to do this post with data but I’m not aware of any source from where I could get data for this over a long period of time. Recent data might be available with vendors such as Opta, but to really test my hypothesis we will need data from much farther back – from the times when few football games were telecast, let alone “tagged” by a system like Opta. Hence, in this post I’ll simply stick to building my hypothesis and leave the testing for an enterprising reader who might be able to access the data.

In association football, it is more likely for an attacker to have a goalscoring opportunity from one side rather than from straight ahead. Standing between the attacker and the net is the opposing goalkeeper, and without loss of generality, the attacker can try to score on either side of the goalkeeper. Now, because of the asymmetry in the attacker’s position, these two sides of the goalkeeper can be described as “near side” and “far side”. The near side is the gap between the goalkeeper and the goalpost closest to the attacker. The far side is the gap between the goalkeeper and the goalpost on the farther side.

Red dot is goalkeeper, blue dot is striker.

 

However, my hypothesis is that this has not been the case recently. For a while now (my football history is poor, so I’m not sure since when) it has been considered shameful for a goalkeeper to be “beaten at the near post”. The argument has been that given the short distance between himself and the near post, the goalie has no business in letting in the ball through that gap. Commentators and team selectors have been more forgiving of the far post, though. The gap there is large enough, they say, that the chances of scoring are high anyway, so it is okay if a goalie lets in a goal on that side.

Introductory microeconomics tells us that people respond to incentives. Goodhart’s Law states that

When a measure becomes a target, it ceases to be a good measure.

So with it becoming part of the general discourse that it is shameful for a goalkeeper to be beaten on the near side, and that selectors and commentators are more forgiving to goals scored on the far side, goalkeepers have responded to the changed incentives. My perception and hypothesis is that with time goalkeepers are positioning themselves closer to their near post, and thus leaving a bigger gap towards the far post. And thus, they are not any more optimizing to minimize the total chance of scoring a goal.

But isn’t it the same thing? Isn’t it possible that the optimal position of the goalkeeper for stopping a shot be the same as that of stopping a shot on the near side? The answer is an emphatic no.

Let us refer to the above figure once again. Let us assume that the chance of scoring when the angle is theta be f(theta). Now, we can argue that this is a super-linear function. That is, if theta increases by 10%, the chances of scoring increase by more than 10%. Again we could use data to prove this but I think it is mathematically intuitive. Given that f(theta) is super-linear, what this means is that 1. The function is strictly increasing, and 2. The derivative f'(theta) is also strictly increasing.

So, going by the above figure, the goalkeeper needs to minimize f(theta_1) + f(theta_2). If the total angle available is theta (= theta_1 + theta_2), then the goalkeeper needs to minimize f(theta_1) + f(theta - theta_1). Taking first derivative and equating it to zero we get,

f'(theta_1) = f'(theta - theta_1)

Because f is a super-linear function, we had argued earlier that its derivative is strictly increasing. Thus, the above equality implies that theta_1 = theta - theta_1 or theta_1 = theta_2 or f(theta_1) = f(theta_2).

Essentially, if the goalkeeper positions himself right, there should be an equal chance of getting beaten on the near and far posts. However, given the stigma attached to being beaten on the near post, he is likely to position himself such that theta_1 < theta_2, and thus increases the overall chance of getting beaten.

It would be interesting to look at data (I’m sure Opta will have this) of different goalkeepers and the number of times they get beaten on the near and far posts. If a goalie is intelligent, these two numbers should be equal. How good the goalkeeper is, however, determined by the total odds of scoring a goal past him.