Machine learning libraries like sklearn come with lots of ML algorithms. Neural Networks, Logistic Regression, Gradient Boosting Machine and so on. Off the shelf they all have on thing in common. If you give them a spreadsheet like set of data they will try to predict one of the columns depending on which one you specify. So if one of my columns contains zero where a horse (a row of data) lost and one if it won then we can get the ML algorithm to create a model that is hopefully good at predicting those 0s and 1s. It will even give you a a probability between 0 and 1 so that you can then rank the horses in a race and perhaps just back the top ranked horse. this is all jolly good if we find this approach produces profit, but can we get an algorithm to predict profit. Would a model built to find profit work better than a model built to find winners ?.

To find profit with an ML algorithm we have to change something called the loss function. So what is the loss function when its at home ?. Let us think about a commonly used one. Mean Squared Error MSE. If say a logistic regression model predicts Mishriff will win the Breeders Cup turf with a probability of 0.21 and he does win then the error is 1 – 0.21 = 0.79

If on the other hand he loses then the error is 0 – 0.21 = -0.21

Now if we square the two potential errors we always get a positive number namely 0.62 and 0.04

This is the SE and we can see that if we take the average of these across all the predictions made in a year we have the MSE

Hopefully you can see that if losers have lower predicted probabilities and winners have higher probabilities as predicted by our model then we are heading in the right direction. If its the other way round then we have a pretty crap model. The algorithm will attempt to minimize this MSE in its search for a good model.

But we want to model for profit not accuracy, we need a different loss function to MSE, we need to create our own, what is commonly known in ML circles as a custom loss function and plug this into our algorithm and say hey use this not the loss function you use by default.

You can do this with LightGBM and XgBoost but it is easier to do with Deep Learning and Keras. I am not going to go into the code detail here but I am going to share my findings after tipping my toe into this pool.

I created a loss function that would maximize variable stake profit proportional to the rating it produced for each horse in a race. In other words it is betting to win £1 on each horse in a race but whatever profit or loss is made on each horse multiplied by the rating value. So if the top rated horse won with a rating of 0.25 the winnings would be £1 x 0.25 and of course the loss on the lower rated horses would be less because they have lower rating values. The loss/profit on a race is therefore being reduced/increased if higher rated horses win.

Plugging this in to a Deep learning Neural Network using Keras produced the following results for top rated horses in each race (UK Handicaps flat). I go on to compare this with a GBM model produced in MySportsAI using the same data but obviously designed to find winners.

First data for 2011 to 2015 was split into 80% for training and 20% for testing chronolgically. If you have used a Neural Network before you will know that because of the stochastic nature of NNs you can train a model and get results from it but if you retrain it then you will get different results (MySportsAI users try this with the NN option). This is not the case with GBM. This does not mean NN’s are unreliable, you just have to train and test a few times to get a reasonable picture ort an average. Here are the results for top rated horses for 5 runs with a custom loss function in place.

Each run produced 3959 top rated bets

Run 1 ROI% 1.79 VROI% 2.45

Run 2 ROI% 5.05 VROI% 1.82

Run 3 ROI% -3.76 VROI% 1.45

Run 4 ROI% -0.08 VROI% 0.69

Run 5 ROI% 2.18 VROI% 3.21

The first thing I should mention about the above models is that in line with common wisdom I scaled the 3 input features so that they were all in a range of 0 to 1. This is something that is commonly advised for NN’s but I was about to find that the opposite was the case for my data which surprised me.

Here are the results without scaling.

Run 1 ROI% 10.53 VROI% 4.8

Run 2 ROI% 6.47 VROI% 2.06

Run 3 ROI% 2.79 VROI% 3.08

Run 4 ROI% 9.77 VROI% 7.79

Run 5 ROI% 9.49 VROI% 12.11

So how does GBM algorithm perform with the same data but obviously no custom loss function

ROI% 5.71 VROI% 5.66

When taking averages GBM is slightly worse than the average performance of the NN using a custom loss function.

My nest step was to look at how these two performed on validation sets. In other words other hold out periods ie 2016-17 data and 2018-19 data. First 2016/17. Firstly the question to ask is which of the 5 runs I performed with the NN should I use. I tried the highest performed first and this gave some weid results, the top rated horse was getting a rating of 0.99etc which suggests something went wrong, probably the NN found whats called a local optima and simply over fitted or in laymans terms, got lucky in this case. Needles to say the results on 2016/17 were poor. Next I tried a mid range model and this looked promising


GBM ROI% -1.66 VROI% 0.55 NN with loss function ROI% 8.08 VROI% 3.68


GBM ROI% 6.23 VROI% 3.11 NN with loss function ROI% 4.12 VROI% 3.78

Another area of interest may be to use the ranking of the horse instead of the probability when multiplying the loss in the loss function. If you have any ideas of your own please comment and vote on the usefulness of the article.