• Home
  • Twitter
  • SmarterSig
  • Betfair
  • About Me
  • Post Cats
    • Betfair API-NG
    • Horse Stride Length
    • Web Scraping Race Data
  • Books
    • Precision CX Wong
    • Hands on ML

Make Your Betting Pay

~ Improve Your Horse Betting

Make Your Betting Pay

Tag Archives: Machine Learning and horse racing

Class & Distance Drilling Down

03 Sunday Jul 2022

Posted by smartersig in Machine Learning

≈ Leave a comment

Tags

Machine Learning and horse racing, tweet

There is a tipping competition running at the moment on Twitter called Handinaps. You have to make 2 selections in every heritage handicap run this year. I have constructed two Machine Learning models to provide entries into the competition. One is built around races over greater than 8f and the other up to 8f in distance. Today I got around to playing around with the question of whether class or distance is a filter we should consider when constructing a model. By that I mean would a model built on all handicaps work just as well a number of models, all using the same input features but constructed only on data for 5f and then data for 6f etc etc. With class it would be a question of constructing data on class 6 races and then class 5. Would the sum of the parts be more predictive than one large model?.

To get a feel for the answer to this question I took a model I had created and trained it on all flat handicap data for 2011 to 2017 using MySportsAI. I then tested the model on 2018 to 2019. The results were as follows

So this is the baseline model. First of all I decided to see if there was any promise by binary chopping the data. That is to say I created two models for class and two for distance, The two class models were for handicaps class >= 5 and handicaps class < 5, here are the results

Class >= 5
VROI% top 3 -0.09% VPL top 3 -2.5
PL top 3 +676

Class < 5
VROI% top 3 -0.4%
PL top 3 -144

We can see the two models have not come close to matching the monolithic baseline model based on these PL metrics. Now lets take a look at race distance. Here I trained and tested on race distance > 8 furlongs and then <= 8 furlongs

Dist > 8
VROI% top 3 +0.47% VPL top 3 +10
PL top 3 +590

Dist <= 8
VROI% top 3 +0.78% VPL top 3 +24.2
PL top 3 +305

This is much closer to the overall performance with no criteria split. I therefore decided to drill down further into the individual race distances

Dist = 5
VROI top 3 +1.32% VPL top 3 +9
PL top 3 +42

Dist = 6
VROI top 3 +1.48% VPL top 3 + 11
PL top 3 +218

Dist = 7
VROI top3 -2.61% VPL top 3 -19.8
PL top 3 -304

Dist = 8
VROI top 3 -0.39% VPL top 3 -3.88
PL top 3 +69

Dist = 9
VROI top 3 +0.48% VPL top 3 +0.8
PL top 3 +217

Dist = 10
VROI top 3 +5.31% VPL top 3 +36
PL top 3 +137

Dist = 12
VROI top 3 -2.97% VPL top 3 -18
PL top 3 +221

Dist > 12
VROI top 3 +2.55% VPL top 3 +15.4
PL Top 3 +426

The above is close to the performance of the overall baseline model but not encouraging enough to think that splicing the model into sub models by distance is worthwhile. Of course the input features play a part, your model may improve for such splitting and I encourage you to experiment. Incidentally one hot encoding the race distance into the baseline model and then training and testing did not improve matters.

Was improvement found anywhere, well yes actually but not down the routes I was playing with above. Can you spot the difference ?

Advertisement

Brier Skill Score and Horseracing

11 Saturday Jan 2020

Posted by smartersig in Profitable Punting with Python, Uncategorized

≈ 3 Comments

Tags

Machine Learning and horse racing, tweet

When developing machine learning models for horse racing we quite rightly need some way to evaluate how successful they are. Horse race betting is a bit different to predicting face recognition or breast cancer diagnosis because these applications are all about accuracy of predictions. With betting, accuracy clearly has a part to play, but its not the complete picture. A less accurate model (in terms of predicting winners) could be more profitable and for that reason we tend to focus on profit. The two most common profit measurements are flat stake profit and variable stake profit. The former simply means putting £1 on every selection, for example the top rated in our ratings. Variable staking means we place a stake set to win £1 relative to the odds. So for example a 2/1 shot would have a bet of 50p placed on it. Of course in both examples the stake can be whatever you want it to be.

The advantage of variable stake monitoring is that it is not prone to inflation from one or two big priced winners which may give you a never to be repeated profit that sends you skipping off to remortgage your house. The variable stake monitoring does not suffer from this and gives a more realistic impression of possible future performance.

So what about the more traditional Machine Learning performance metrics, should we bin them when developing ML models and simply focus on profit/loss ?. Probably not, a mixture of metrics can help give us more confidence if all of them are showing improved signs over a rival model.

Horse Racing models often have a degree of inbalanced data. That is is to say that the thing we are trying to predict (win or lose) usually contains far more zeros than one’s, after all our lines of horse data will clearly contain more losers than winners unless we have engineered the data in some way.

One metric that is useful for inbalanced data sets is the Brier Score and what I am about to describe is its close cousin the Brier Skill Score

First of all what is a Brier Score. Imagine we have a three horse race with the following

horse, model probability, W/L (1 means won 0 means lost)

Al Boum Photo, 0.5, 1
Lost In Translation, 0.3, 0
Native River, 0.2, 0

So our model gave Al Boum Photo a 0.5 chance and he won the race.

The Brier score for these 3 lines of data would be

((0.5 – 1)^2 + (0.3 – 0)^2 + (0.2 – 0)^2) / 3 = 0.1266

Where ^2 simply means ‘squared’

Looking at the above you can hopefully see that if the lower rated horses tend to lose and higher rated horses tend to win we will get a lower Brier score than if races were predicted the other way round. This is why a lower Brier Score means a ‘better’ score.

Next up is the Brier Skill Score (BSS). This measures the Brier Score against some other measure, after all stating that the score above is 0.1266 does not give you an instinctive feeling of how good or bad it is. We just know its better than 0.2 for example.

The BSS is calculated by first working out some sort of measure we can compare to. In this case we will opt for a baseline measure of simply predicting all horses with a value of 0.33. Why 0.33, well because that is the percentage of 1’s in the sample set. Obviously across many races this will come out at more like 0.1 or thereabouts. With the 0.33 for every horse we can now calculate a Brier Score based on probabilities of 0.33 for every horse. What we are doing is using an average likelyhood for the prediction probability of each horse. Substituting this in we get

((0.33 – 1)^2 + (0.33 – 0)^2 + (0.33 – 0)^2) / 3 = 0.2222

Now to calculate the BSS we divide the models Brier score by the Naive predictions Brier score and then subtract this from 1

1 – (0.1266 / 0.2222) = 0.4302

Negative values mean the model has less predictive value than naive baseline probabilities. Positive values (max = 1) mean the model is beating naive baseline predictions. Our one sample 3 horse race is clearly kicking butt but over many races that score would certainly come down but if your model is any good, hopefully stay above zero. More importantly if you modify a model and your BSS score go’s up then you can be hopeful that the changes are worth sticking with.

Machine Learning What Should We Predict

20 Friday Sep 2019

Posted by smartersig in Uncategorized

≈ 2 Comments

Tags

Machine Learning and horse racing, Machine Learning Horse racing, tweet

I have been playing around with an ML model today and the purpose of this post is to hopefully promote some discussion about potential target fields.

When you feed data to an ML algorithm you need to define input features eg is the horse a course winner along with a feature that the inputs have to predict. It is with the latter that I was running a simple experiment. I ran a model on four different target features to get a feel if one stood out from the others. The four varieties were as follows.

1. Good of old fashioned 1 if the horse won and 0 if the horse lost
2. 1 if the horse won or came second 0 otherwise
3. 1 if the horse won or came second or finished 3rd in a race with more than 8 runners
4. 1 if the horse finished in the first 4 and out ran its odds 0 otherwise

In the last case outran its odds simply meant that the horse was positionally longer in the odds than its finishing position. For example a horse finishing 2nd but went off fav would be a 0 whereas a horse finishing 4th and being 5th in the betting gets a 1

I tested for both how the top rated performed and how simply backing horses above a threshold performed. This is a quick and dirty measure but the objective is to foster some discussion hopefully on other measures for target variables.

Option 1 produced

Toprated 7998 bets 1323 wins PL after comm’ +514pts ROI +6.42% Varpl +40.6

Option 2 produced

8012 bets 1328 wins PL +265.3 ROI 3.3% Varpl +43.9

Option 3 produced

8028 Bets 1365 wins PL +413 ROI +5.15% Varpl +91.5

Option 4 produced

8056 bets 1201 wins PL +235.9 ROI +2.92% Varpl + 67.79

When it came to simply backing any horse above a certain threshold on the ratings option 3 performed best followed by option 2 and then option 1 and finally option 4

The reason for trying the various options is that unbalanced data can effect the performance of ML algorithms although the Gradient Boosting Tree based algorithm I am using suffers least. An unbalanced data set simply means fewer 1’s than 0’s. The closer you get to 50-50 on the target 1’s and 0’s the more balanced the data is. Clearly adding placed runs increases the balance.

The question however is are there other options worth throwing at the algorithm. I would be happy to receive any suggestions on other possible target fields in the comments section.

Profitable Punting With Python 1

30 Saturday Jan 2016

Posted by smartersig in Profitable Punting with Python

≈ 14 Comments

Tags

Machine Learning and horse racing

I have prepared some introductory sessions on machine learning for horse racing using Python and Scikit Learn. You do not need previous experience of either of these two tools but it would help if you are at least familiar with some basic programming concepts. For example it would help if you know what a FOR loop is, what an assignment statement is even if it is not in Python.

The main data file will be freely available until Tuesday 2nd February for those who showed an initial interest. After this it will be in the utilities section of the http://www.smartersig.com web site. A modest members fee will enable you to access it.

The instructions will be freely available to all at all times.

OK to get started you will need to have downloaded and installed Anaconda Python v3.4, see previous blog post Profitable Punting with Python Intro for details.

Once this has installed create a folder in your anaconda folder called horseracing.

All comments, questions and feedback should be posted to this blog post, that way they can act as a FAQ source.

First of all download the following zip file, double click on it to reveal all the contained files and copy them into your horseracing folder.

http://www.smartersig.com/pythonpunting.zip

The next step is to download the following file into your horse racing folder. When you click the link it will probably display the contents in your web browser. Just right click the display and you will have the option to save to a file the screen data.

This file is now housed in the utilities section of the smartersig.com web site and is called aiplus12to14.csv

You now have the required files. To get started first open a msdos command window (the black box type)

Now navigate to your anaconda folder using cd command eg cd anaconda

Kick start Ipython Notebook by typing in ipython notebook and pressing return. (note on latest version this may now be jupyter notebook)

Once notebook is loaded up you will be presented with a directory screen of folders. Double click on the horseracing folder (that you created) to go into that folder.

Now double click on the file ProfitablePuntingWithPython1.ipynb

Follow the instructions within the displayed notebook.

Archives

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Make Your Betting Pay
    • Join 50 other followers
    • Already have a WordPress.com account? Log in now.
    • Make Your Betting Pay
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...