One question I get asked quite regularly is how I calculate my odds for football. There are many different methods I use, and I never rely on just one model, but I thought I’d share here a very simple method of compiling odds for football.
The difficulty with many of the advanced statistical models is that they use more advanced statistics such as x,y shot locations and other data that is not publicly available, or they use programs that may not be easily accessible to the average fan due to lack of coding knowledge – programmes such as R or Python come to mind. The model that I will walk you through here uses data that is publicly available and needs only a basic knowledge of Excel to build.
*Note: This is intended for people relatively new to this area so may come across as very simplistic to people more experienced with statistics and with Excel.
Step 1: Get & Transform Data
Go to Joseph Buchdahl’s football-data.co.uk (and be sure to follow him on twitter @12Xpert). For this article I’ll focus on the Premier League, so download data for the EPL for the past 5 seasons (including this season).
We’ll be making two separate results sheets using this data. One using only this season’s results, and one using all results from the past 5 seasons.
When you open the downloaded files they should look like this:
Open a new excel file and copy and paste this data:
You’ll notice the two worksheets – Results – only this season’s data & HistResults – data from 5 seasons.
** NB – When copying data to the HistResults sheet make sure your columns are consistent. As bookmakers become defunct or their odds are not available, they are not included in the results. For example, in the data for 2012-2013, there are 3 columns for Gamebookers, however, in later seasons this data is not included. While this is not essential for our purposes, if you would like to use this data for other purposes – odds comparison, Asian handicap analysis etc.. it’s essential to keep your data consistent. When adding new seasons to the HistResults worksheet, copy the top row of the HistResults sheet and paste it above your new data so you have something that looks like this:
This way you can add/delete columns, or move new data to line up with previous data. As an example, let’s look at the odds for Interwetten:
As you can see, the new data we have just added does not line up with the previous results. Here we need to move our new data from Interwetten onwards 3 columns to the right. Click on the cell IWH (below GBH), press CTRL+SHIFT+DOWN, then CTRL+SHIFT+RIGHT. Cut the data by pressing CTRL+X, and paste it by clicking on the cell LBH and pressing CTRL+V. Our data should now look like this:
Check that the rest of the data lines up then delete the two rows with our headers so there is no text between the numbers.
Step 2: Gathering Stats
On a new worksheet, we are going to collect data from our Results. We need a row that contains each team for this season
**NB: The team names must match the names used in the results page. For example, using the team name “Spurs” will result in an error as they are recorded as “Tottenham”.
A fast way to do this is to copy all team names in column C of our results sheet and paste them from cell A2 on our new sheet. In your toolbar select DATA>REMOVE DUPLICATES and follow the prompts. This leaves us with this:
Next we need our column headers:
Now we have the skeleton of our first table set up, here are the formulas for each column.
In cell B2 add the following:
=(AVERAGEIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!L:L)+AVERAGEIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!M:M))/2
=(AVERAGEIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!N:N)+AVERAGEIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!O:O))/2
=(SUMIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!$E:$E)+SUMIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!$F:$F))/(SUMIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!$N:$N)+SUMIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!$O:$O))
Simply multiply C2 by D2.
Select cells B2 to E2 and copy down for all teams so we have something that looks like this:
Now we have completed the For columns, have a go at doing the Against columns by changing parts of the above formulas. Your results after a little cleaning up should like this:
That’s probably a good place to stop for now. I would recommend you try doing the above for the historical results also. Follow the steps outlined above, but get averages for the last 5 seasons rather than just the current one.
In Part II, we’ll take the data from the table we have just made to produce our first set of odds.
Let me know how you get on and if there are any questions feel free to ask either below or @petermckeever