salah – Pertinent Observations

4-2-4 and Huns

Last night, in the game at Stamford Bridge, Liverpool started with a formation that could have been described as a 4-2-4. While Cody Gakpo ultimately played in midfield, to make it a more conventional 4-3-3, he is ultimately a forward who was playing there, and made Liverpool vulnerable down the left side for the duration of the first half.

This wasn’t the first time Liverpool lined up in a 4-2-4 without an obvious holding midfielder. For a while during the title chase of 2013-14, Liverpool lined up broadly similarly, with Gerrard and Henderson in central midfield, and Sterling, Sturridge, Suarez and Coutinho forming a front four.

And the thing that characterised a lot of games in that title chase was Liverpool’s fast starts. I remember this game against Arsenal (I wasn’t watching) when Liverpool went 4-0 or something up very very quickly. That was emblematic of that half season – very very quick starts, lots of goals up front, and then quickly tiring and having to hold on for dear life in the end of the game.

When Liverpool failed to score early, like they did in the game against Chelsea (when Gerrard famously slipped, and when Salah started for Chelsea), they would get immensely frustrated and look short of ideas. It was very different to recent years when Liverpool have been able to conjure up last minute equalisers and winnres.

Anyway, yesterday seemed like 2013-14 again. Liverpool was clearly the better team in the first half hour, only a very tight offside prevented the game from going 2-0. The profusion of forwards, and Alexis Mac Allister pinging balls to all parts of the frontline, meant that Liverpool dominated.

Then the inevitable happened – Chelsea settled. Their midfield three got working and soon Liverpool were massively overrun in midfield. Chelsea quickly got one back, almost got one more, and dominated most of the rest of the game (until Liverpool took of Salah and Diaz for a pair of kids).

The thing with the 4-2-4 is that it is an unusual and incredibly attacking formation. The opposition will inevitably take time to settle down against it and figure out how to deal with it. And in that time, the attacking team needs to make merry and score as much as they can (Liverpool only got one).

Once the opposition settles down, the shortage of personnel in midfield can be quickly exploited and the opposition starts dominating the game.

As I was watching, I was reminded of the Age of Empires (2; the conquerors expansion) which I used to play back in college. There, you can select the civilisation you want to play as (sometimes it’s “random”). A few people used to prefer to play as Huns.

The thing with Huns is that they don’t need to build houses (they are nomadic), and so can grow very quickly very fast. And in an AoE game, if you are playing as the Huns, the only strategy is to attack quickly and cause enough damage to the opposition in the opening stages of the game that they can’t recover after that. Because once the opposition has settled down, the Huns’ speed advantage has lost its bite.

And so, playing a 4-2-4 in football is similar to playing as the Huns in AoE. You better make a good start and inflict enough damage on the opposition in the early stages so that they aren’t able to sufficiently damage you back after they’ve inevitably settled down.

Connecting these two topics – I heard on commentary last night that Liverpool has never won a game where a Hungarian has represented them. That trend continues after last night. Hopefully Dominik Szoboszlai can make amends soon.

Mo Salah and Machine Learning

First of all, I’m damn happy that Mo Salah has renewed his Liverpool contract. With Sadio Mane also leaving, the attack was looking a bit thin (I was distinctly unhappy with the Jota-Mane-Diaz forward line we used in the Champions League final. Lacked cohesion). Nunez is still untested in terms of “leadership”, and without Salah that would’ve left Firmino as the only “attacking leader”.

(non-technical readers can skip the section in italics and still make sense of this post)

Now that this is out of the way, I’m interested in seeing one statistic (for which I’m pretty sure I don’t have the data). For each of the chances that Salah has created, I want to look at the xG (expected goals) and whether he scored or not. And then look at a density plot of xG for both categories (scored or not).

For most players, this is likely to result in two very distinct curves – they are likely to score from a large % of high xG chances, and almost not score at all from low xG chances. For Salah, though, the two density curves are likely to be a lot closer.

What I’m saying is – most strikers score well from easy chances, and fail to score from difficult chances. Salah is not like that. On the one hand, he creates and scores some extraordinary goals out of nothing (low xG). On the other, he tends to miss a lot of seemingly easy chances (high xG).

In fact, it is quite possible to look at a player like Salah, see a few sitters that he has missed (he misses quite a few of them), and think he is a poor forward. And if you look at a small sample of data (or short periods of time) you are likely to come to the same conclusion. Look at the last 3-4 months of the 2021-22 season. The consensus among pundits then was that Salah had become poor (and on Reddit, you could see Liverpool fans arguing that we shouldn’t give him a lucrative contract extension since ‘he has lost it’).

It is well possible that this is exactly the conclusion Jose Mourinho came to back in 2013-14 when he managed Salah at Chelsea (and gave him very few opportunities). The thing with a player like Salah is that he is so unpredictable that it is very possible to see samples and think he is useless.

Of late, I’ve been doing (rather, supervising (and there is no pun intended) ) a lot of machine learning work. A lot of this has to do with binary classification – classifying something as either a 0 or a 1. Data scientists build models, which give out a probability score that the thing is a 1, and then use some (sometimes arbitrary) cutoff to determine whether the thing is a 0 or a 1.

There are a bunch of metrics in data science on how good a model is, and it all comes down to what the model predicted and what “really” happened. And I’ve seen data scientists work super hard to improve on these accuracy measures. What can be done to predict a little bit better? Why is this model only giving me 77% ROC-AUC when for the other problem I was able to get 90%?

The thing is – if the variable you are trying to predict is something like whether Salah will score from a particular chance, your accuracy metric will be really low indeed. Because he is fundamentally unpredictable. It is the same with some of the machine learning stuff – a lot of models are trying to predict something that is fundamentally unpredictable, so there is a limit on how accurate the model will get.

The problem is that you would have come across several problem statements that are much more predictable that you think it is a problem with you (or your model) that you can’t predict better. Pundits (or Jose) would have seen so many strikers who predictably score from good chances that they think Salah is not good.

The solution in these cases is to look at aggregates. Looking for each single prediction will not take us anywhere. Instead, can we predict over a large set of data whether we broadly got it right? In my “research” for this blogpost, I found this.

Last season, on average, Salah scored precisely as many goals as the model would’ve predicted! You might remember stunners like the one against Manchester City at Anfield. So you know where things got averaged out.

What Ails Liverpool

So Liverpool FC has had a mixed season so far. They’re second in the Premier League with 36 points from 14 games (only points dropped being draws against ManCity, Chelsea and Arsenal), but are on the verge of going out of the Champions League, having lost all three away games.

Yesterday’s win over Everton was damn lucky, down to a 96th minute freak goal scored by Divock Origi (I’d forgotten he’s still at the club). Last weekend’s 3-0 against Watford wasn’t as comfortable as the scoreline suggested, the scoreline having been opened only midway through the second half. The 2-0 against Fulham before that was similarly a close-fought game.

Of concern to most Liverpool fans has been the form of the starting front three – Mo Salah, Roberto Firmino and Sadio Mane. The trio has missed a host of chances this season, and the team has looked incredibly ineffective in the away losses in the Champions League (the only shot on target in the 2-1 loss against PSG being the penalty that was scored by Milner).

There are positives, of course. The defence has been tightened considerably compared to last season. Liverpool aren’t leaking goals the way they did last season. There have been quite a few clean sheets so far this season. So far there has been no repeat of last season’s situation where they went 4-1 up against ManCity, only to quickly let in two goals and then set up a tense finish.

So my theory is this – each of the front three of Liverpool has an incredibly low strike rate. I don’t know if the xG stat captures this, but the number of chances required by each of Mane, Salah and Firmino before they can convert is rather low. If the average striker converts one in two chances, all of these guys convert one in four (these numbers are pulled out of thin air. I haven’t looked at the statistics).

And even during the “glory days” of last season when Liverpool was scoring like crazy, this low strike rate remained. Instead, what helped then was a massive increase in the number of chances created. The one game I watched live (against Spurs at Wembley), what struck me was the number of chances Salah kept missing. But as the chances kept getting created, he ultimately scored one (Liverpool lost 4-1).

What I suspect is that as Klopp decided to tighten things up at the back this season, the number of chances being created has dropped. And with the low strike rate of each of the front three, this lower number of chances translates into much lower number of goals being scored. If we want last season’s scoring rate, we might also have to accept last season’s concession rate (though this season’s goalie is much much better).

There ain’t no such thing as a free lunch.