When developing machine learning models for horse racing we quite rightly need some way to evaluate how successful they are. Horse race betting is a bit different to predicting face recognition or breast cancer diagnosis because these applications are all about accuracy of predictions. With betting, accuracy clearly has a part to play, but its not the complete picture. A less accurate model (in terms of predicting winners) could be more profitable and for that reason we tend to focus on profit. The two most common profit measurements are flat stake profit and variable stake profit. The former simply means putting £1 on every selection, for example the top rated in our ratings. Variable staking means we place a stake set to win £1 relative to the odds. So for example a 2/1 shot would have a bet of 50p placed on it. Of course in both examples the stake can be whatever you want it to be.
The advantage of variable stake monitoring is that it is not prone to inflation from one or two big priced winners which may give you a never to be repeated profit that sends you skipping off to remortgage your house. The variable stake monitoring does not suffer from this and gives a more realistic impression of possible future performance.
So what about the more traditional Machine Learning performance metrics, should we bin them when developing ML models and simply focus on profit/loss ?. Probably not, a mixture of metrics can help give us more confidence if all of them are showing improved signs over a rival model.
Horse Racing models often have a degree of inbalanced data. That is is to say that the thing we are trying to predict (win or lose) usually contains far more zeros than one’s, after all our lines of horse data will clearly contain more losers than winners unless we have engineered the data in some way.
One metric that is useful for inbalanced data sets is the Brier Score and what I am about to describe is its close cousin the Brier Skill Score
First of all what is a Brier Score. Imagine we have a three horse race with the following
horse, model probability, W/L (1 means won 0 means lost)
Al Boum Photo, 0.5, 1
Lost In Translation, 0.3, 0
Native River, 0.2, 0
So our model gave Al Boum Photo a 0.5 chance and he won the race.
The Brier score for these 3 lines of data would be
((0.5 – 1)^2 + (0.3 – 0)^2 + (0.2 – 0)^2) / 3 = 0.1266
Where ^2 simply means ‘squared’
Looking at the above you can hopefully see that if the lower rated horses tend to lose and higher rated horses tend to win we will get a lower Brier score than if races were predicted the other way round. This is why a lower Brier Score means a ‘better’ score.
Next up is the Brier Skill Score (BSS). This measures the Brier Score against some other measure, after all stating that the score above is 0.1266 does not give you an instinctive feeling of how good or bad it is. We just know its better than 0.2 for example.
The BSS is calculated by first working out some sort of measure we can compare to. In this case we will opt for a baseline measure of simply predicting all horses with a value of 0.33. Why 0.33, well because that is the percentage of 1’s in the sample set. Obviously across many races this will come out at more like 0.1 or thereabouts. With the 0.33 for every horse we can now calculate a Brier Score based on probabilities of 0.33 for every horse. What we are doing is using an average likelyhood for the prediction probability of each horse. Substituting this in we get
((0.33 – 1)^2 + (0.33 – 0)^2 + (0.33 – 0)^2) / 3 = 0.2222
Now to calculate the BSS we divide the models Brier score by the Naive predictions Brier score and then subtract this from 1
1 – (0.1266 / 0.2222) = 0.4302
Negative values mean the model has less predictive value than naive baseline probabilities. Positive values (max = 1) mean the model is beating naive baseline predictions. Our one sample 3 horse race is clearly kicking butt but over many races that score would certainly come down but if your model is any good, hopefully stay above zero. More importantly if you modify a model and your BSS score go’s up then you can be hopeful that the changes are worth sticking with.