In Part I we looked at gathering and cleaning data from football-data.co.uk, and building our first basic table. In this part we’ll discuss xGoals in a bit more detail and run through our first method for calculating odds, using the upcoming game between Liverpool and Everton.
The bread and butter of Fanalytics. Expected Goals (xGoals, xG) are everywhere. There are many ways of calculating them. Some are quite advanced – using location data and player lineups – while others are very primitive – like this method. However they are calculated, they all have the same objective: to assess/predict the deserved number of goals scored/conceded by football teams rather than the actual number of goals.
In the previous article, I showed a method of finding the xGoals scored and conceded for each team in the EPL. Here’s the table we made again:
First, let’s discuss some limitations of this data. This table only takes this season’s data into account, as such it means that this method is not reliable in the early stages of the season. This is why I recommended making a second table using the past 5 seasons worth of data also. Here is that chart:
Another is dealing with newly promoted teams. Much like dealing with our dataset that only takes this season into account, there is insufficient data to accurately analyse these teams early in the season. A workaround for this is to simply use the figures of the 3 relegated teams from the previous season until enough data is available. I disagree with this line of thinking though, as it assumes that those teams are of the same quality. For me, I rather use the average of each promoted and relegated team from the previous five seasons, which I will tackle in a later article in this series that will show you how to build a league table and analyse team over/under performance.
Let’s take a look at an upcoming match to see how we can get some kind of probabilities from our table. Before we jump into this I would like to show you some ways of automating the process.
Making an Odds Calculation Worksheet
On a new worksheet, select cells c2:e2 and click “Merge & Centre”.
Next, in your toolbar, click DATA>Data Validation and select LIST from the dropdown menu.
Click on the to the right of the source box and enter the following:
=’Shots and Goal Data’!$A$2:$A$21
This links the team names in the first table we created in Part I and gives us a drop down list of teams.
With this done once, we can simply select cell C2, press CTRL+C, and paste it in cell G2 by pressing CTRL+V. Now we have a drop down list for every possible match up in the current PL season. In our home box, select Liverpool, and in our away box, select Everton. I’ve added some extra cells which you can see here:
In cell L2, we need to find a way of taking the xGoals For and Against for both Liverpool and Everton without having to back and forth between sheets. To do this we can either use a VLOOKUP function, or an INDEX MATCH function. The index function is a lot more flexible and it is an asset to be comfortable with using it, so we’ll use it for our odds calculation (an in-depth look at formulas is something others have done much better than I could, but I may outline some of them in future articles).
So, in L2 we enter:
=INDEX(‘Shots and Goal Data’!$E$2:$E$21,MATCH(‘Odds Calculation’!L$1,’Shots and Goal Data’!$A$2:$A$21,0))
In cell L3, we need to change that $E$2:$E$21 part of the formula. This references the xG Scored column in our first table, but we need it to reference xG Conceded.
when we have that done, copy cells L2:L3 across to column M.
This shows us that using this season’s data, Liverpool have an xG for per game of 2.110 and an xG against of 1.231 per game. In order to find the xG for their match versus Everton though, we need to run another formula.
In cell P2, multiply Liverpools xG for (L2) by Everton’s xG against (M3) and add the home advantage:
In P3 do the same for Everton (M2*L3) without the Home Advantage of course.
This gives us each team’s expected goals for the match:
- Liverpool: 2.602 Goals
- Everton: 2.155 Goals
Now we have that set up, each time we change the teams in cells C2 and G2, this data will automatically update.
Finding the Odds
Right so, on to probably the only part you wanted to read. Now that we have our xGoals for the match between Liverpool and Everton, we can use these two inputs to calculate almost all odds in the upcoming match. To do this we need to use what is called the Poisson Distribution.
In my opinion, there are better methods for calculating odds (ZIP, Negative Binomial Dist) which I hope to cover later in this series, but a fantastic starting point is the Poisson.
The Poisson Distribution gives the range of possible outcomes occurring in a football match. The outcomes we will use are the possible scores that this match could finish with.
From cells A9:A19 enter the numbers 0-10. Do the same for cells B8:L8 so we get an outline like so:
In cell B9, enter the following:
Copy the formula across and down and set the number value to Percentage:
This shows us the range of possible outcomes in this game from 0-0 to 10-10. You’ll notice that the chance of a 0-0 is very low – too low (less than 1%), while the chance of a 10-10 finish is virtually 0%.
We can clean up the data and use some conditional formatting to highlight the key information:
The green area shows all of the scorelines where Liverpool win, the Orange area shows all scorelines where Everton win, and the yellow area shows the scorelines that end in a draw. By using conditional formatting, I have highlighted the top 5 results in the data. If we use Excels’s SUM function, we can add up all the probabilities of a Liverpool victory, of an Everton victory, and of the match ending in a draw.
This shows that Liverpool have a 48.65% chance of winning, there is an 18.50% chance of a draw, and Everton have a 32.84% chance of winning.
We can turn this into decimal odds by dividing the % chance by 1:
This gives the odds for the match at:
- Liverpool: 2.06
- Draw: 5.21
- Everton: 3.05
Currently, the average odds on offer show Liverpool as a 1.62 favourite, which is a much higher chance for Liverpool than our model. I have found that using 5 seasons of data works much better for well-established PL teams while using data from only this season works better for teams new to the Premier League, and teams that are clearly over/under acheiving – Leicester and Chelsea last season come to mind. When we use the xGoal data from our 5 season table, the odds are:
- Liverpool: 1.82
- Draw: 5.51
- Everton 3.70
with Liverpool expected to score 2.67 goals and Everton 1.89 goals, which is probably closer to reality with the chance of the match finishing 0-0 being better represented.
Here we can see that the 5 season figures work much better for this game, but there are a couple of games in the coming game week that will use our 1 season data instead:
Issues & Limitations
While it is a great starting point for those new to the area, there are issues with this method.
Firstly, it does not take into consideration factors such as line-ups, injuries, new signings, weather, motivation, derbies etc.. etc…
Secondly, the Poisson Distribution tends to under-estimate low scores and draws.
Lastly, Bookmaker margin is not considered in Poisson calculations. This is something we’ll take a look at in Part III when we expand the markets beyond simple Home-Draw-Away to Over/Under, Both to Score, European and Asian Handicaps, and so on.
Have a go at it yourself and let me know how you get on. Feel free to play around with layout etc… the more of your own ideas you can put into this and develop the better. If you have any questions please feel free to contact me below or @petermckeever
**note: an earlier version of this article displayed incorrect xG against figures for the 5 season data. This has been updated since. Thank you to @willemvdbrink for pointing it out!