Following on from the last two posts where I looked at a simple expected goals model built using Machine Learning In this post I am going to describe how you can get involved and play around with the very same data.

The first thing you will need to do is gather the required data. this is very quick and simple. At football-data.co.uk go to the english premier league historical results and download the data for 2021/22. Save this in a new folder and name it what you want although I called my data file e02122.csv. This is the file that will contain the results and the betting odds

Second thing to do is create a file using a text editor eg notepad called resultfiles.csv and in it place the following line

e02122.csv

If you have called it something else then obviously enter that file name and dont forget to press return to start a new line. You can obviously gather more seasons than this, the above is just as an example, but enter their file names into resultfiles.csv

Next step is to gather some expected goals data. Go to the http://www.fbref.com web site and click on the competitions tab and select premier league

Click previous season to move back to 2021/22

Now click the score&fixtures tab to display the match scores for 2021/22, you will notice that this also contain xg (expected goals)

Now click share & export followed by Get table as csv

The results will now be displayed in csv format. I have not found a link to download the data as with football-data-co.uk so you will have to copy and paste the results from the page to a notepad file and call it Premexpgoals2122.csv

Almost finished, now create a new file in notepad called featurefiles.csv and enter into it the following line

Premexpgoals2122.csv

Again if you grab other seasons then enter their file names into featurefile.csv as well

There are a couple of other files that I will supply with the software, the first is called repteamnames.csv

Because team names in the results and the xgoals files are not always the same eg Man Utd Manchester Utd the software will read the teams that need editing from this file and make the needed changes. If you run into any new discrepancies from earlier years just add the from and to names to this file. When you look at the repteamnames file and Premexpgoals2122 and eo2122 you will see why they are in repteamnames.

The other file I will supply is histmatchweights.csv

This file will contain the following

3
0.2,0.3,0.5

The 3 means that the software will gather the xgoals from a teams last 3 games

The 0.2,0.3,0.5 makes the software weight the last game with 0.5, the second last game 0.3 and the third last 0.2

You can play around with these values in order to make the software create different data

You can download the software createMLfootie.exe from the utilities section of http://www.smartersig.com along with the two files I supply mentioned above.

Once you have run the program it will create a file called MLfootie.csv, it is this file you can load up into MySportsAI and create models on.

The idea behind this is to dip a toe into data modelling for football and perhaps with discussion and further ideas we can develop and enrich the data inputs, I already have a few ideas.

Best of luck and let me know how you get on

Advertisement