• Home
  • Twitter
  • SmarterSig
  • Betfair
  • About Me
  • Post Cats
    • Betfair API-NG
    • Horse Stride Length
    • Web Scraping Race Data
  • Books
    • Precision CX Wong
    • Hands on ML

Make Your Betting Pay

~ Improve Your Horse Betting

Make Your Betting Pay

Category Archives: Deep Learning

Predicting Profit Not Winners

05 Saturday Nov 2022

Posted by smartersig in Deep Learning, Machine Learning

≈ Leave a comment

Tags

tweet

Machine learning libraries like sklearn come with lots of ML algorithms. Neural Networks, Logistic Regression, Gradient Boosting Machine and so on. Off the shelf they all have on thing in common. If you give them a spreadsheet like set of data they will try to predict one of the columns depending on which one you specify. So if one of my columns contains zero where a horse (a row of data) lost and one if it won then we can get the ML algorithm to create a model that is hopefully good at predicting those 0s and 1s. It will even give you a a probability between 0 and 1 so that you can then rank the horses in a race and perhaps just back the top ranked horse. this is all jolly good if we find this approach produces profit, but can we get an algorithm to predict profit. Would a model built to find profit work better than a model built to find winners ?.

To find profit with an ML algorithm we have to change something called the loss function. So what is the loss function when its at home ?. Let us think about a commonly used one. Mean Squared Error MSE. If say a logistic regression model predicts Mishriff will win the Breeders Cup turf with a probability of 0.21 and he does win then the error is 1 – 0.21 = 0.79

If on the other hand he loses then the error is 0 – 0.21 = -0.21

Now if we square the two potential errors we always get a positive number namely 0.62 and 0.04

This is the SE and we can see that if we take the average of these across all the predictions made in a year we have the MSE

Hopefully you can see that if losers have lower predicted probabilities and winners have higher probabilities as predicted by our model then we are heading in the right direction. If its the other way round then we have a pretty crap model. The algorithm will attempt to minimize this MSE in its search for a good model.

But we want to model for profit not accuracy, we need a different loss function to MSE, we need to create our own, what is commonly known in ML circles as a custom loss function and plug this into our algorithm and say hey use this not the loss function you use by default.

You can do this with LightGBM and XgBoost but it is easier to do with Deep Learning and Keras. I am not going to go into the code detail here but I am going to share my findings after tipping my toe into this pool.

I created a loss function that would maximize variable stake profit proportional to the rating it produced for each horse in a race. In other words it is betting to win £1 on each horse in a race but whatever profit or loss is made on each horse multiplied by the rating value. So if the top rated horse won with a rating of 0.25 the winnings would be £1 x 0.25 and of course the loss on the lower rated horses would be less because they have lower rating values. The loss/profit on a race is therefore being reduced/increased if higher rated horses win.

Plugging this in to a Deep learning Neural Network using Keras produced the following results for top rated horses in each race (UK Handicaps flat). I go on to compare this with a GBM model produced in MySportsAI using the same data but obviously designed to find winners.

First data for 2011 to 2015 was split into 80% for training and 20% for testing chronolgically. If you have used a Neural Network before you will know that because of the stochastic nature of NNs you can train a model and get results from it but if you retrain it then you will get different results (MySportsAI users try this with the NN option). This is not the case with GBM. This does not mean NN’s are unreliable, you just have to train and test a few times to get a reasonable picture ort an average. Here are the results for top rated horses for 5 runs with a custom loss function in place.

Each run produced 3959 top rated bets

Run 1 ROI% 1.79 VROI% 2.45

Run 2 ROI% 5.05 VROI% 1.82

Run 3 ROI% -3.76 VROI% 1.45

Run 4 ROI% -0.08 VROI% 0.69

Run 5 ROI% 2.18 VROI% 3.21

The first thing I should mention about the above models is that in line with common wisdom I scaled the 3 input features so that they were all in a range of 0 to 1. This is something that is commonly advised for NN’s but I was about to find that the opposite was the case for my data which surprised me.

Here are the results without scaling.

Run 1 ROI% 10.53 VROI% 4.8

Run 2 ROI% 6.47 VROI% 2.06

Run 3 ROI% 2.79 VROI% 3.08

Run 4 ROI% 9.77 VROI% 7.79

Run 5 ROI% 9.49 VROI% 12.11

So how does GBM algorithm perform with the same data but obviously no custom loss function

ROI% 5.71 VROI% 5.66

When taking averages GBM is slightly worse than the average performance of the NN using a custom loss function.

My nest step was to look at how these two performed on validation sets. In other words other hold out periods ie 2016-17 data and 2018-19 data. First 2016/17. Firstly the question to ask is which of the 5 runs I performed with the NN should I use. I tried the highest performed first and this gave some weid results, the top rated horse was getting a rating of 0.99etc which suggests something went wrong, probably the NN found whats called a local optima and simply over fitted or in laymans terms, got lucky in this case. Needles to say the results on 2016/17 were poor. Next I tried a mid range model and this looked promising

2016/17

GBM ROI% -1.66 VROI% 0.55 NN with loss function ROI% 8.08 VROI% 3.68

2018/19

GBM ROI% 6.23 VROI% 3.11 NN with loss function ROI% 4.12 VROI% 3.78

Another area of interest may be to use the ranking of the horse instead of the probability when multiplying the loss in the loss function. If you have any ideas of your own please comment and vote on the usefulness of the article.

Advertisement

Deep Learning and Horse Racing

17 Thursday May 2018

Posted by smartersig in Deep Learning

≈ 19 Comments

Tags

Deep Learning Horse Racing Machine Learning

Came back inspired and fascinated from the cinema the other day having sat with one other lone cinema goer watching AlphaGo.
Alphago is a deep learning program create by the company DeepMind to challenge the world champion Go player. Since the defeat of Kasparov, a world chess champion by a similar deep learning program the next mountain to climb was always Go. So far its proved elusive as the number of game permutations in Go make Chess look like noughts and crosses and it was thought that Go might be just too difficult for an AI program. If you get a chance you must see the documentary as it tracks the development, first beating the European champion and then the world champion. Even more interesting is the reaction of the huge crowd watching the event.

If you have read any of my other posts you will know that I have been impressed by the gains that seem to be possible surrounding the machine learning algorithm Gradient Boosting. This algorithm seems to be the defacto Kaggle competition winner at the moment. Kaggle, if you are not familiar, is a web site where data scientist hobbyists and pro’s take on submitted data sets and see who can produce the best Machine Learning solution. Inspired by Go I finally got around to checking out Deep Learning and was not surprised to find further gains. I tested three approaches on a simple data set consisting of just two features, namely horse age and days since last run. In all three cases I trained the models on two years of flat handicap data and tested them on one year of handicap data. Deep learning came out ahead of GDB which in turn beat Random Forests in terms of profit and loss of top rated.

If this topic would be of interest as perhaps a hands on tutorial then please leave a comment below. In the meantime probably the first thing you need to do if you want to get involved is to install Tensorflow and Keras. Keras is a front end built on top of Tensorflow and provides a simplified access to deep learning. You will need to have Anaconda Python installed which if you followed my earlier blog on Machine Learning you should have installed see here

https://markatsmartersig.wordpress.com/2016/01/13/profitable-punting-with-python-1/

Installing Tensorflow and Keras

First you need to create a new environment for your Keras based programs. Pull up a command box (type command in windows search box)

Assuming you have Anaconda installed enter the following command (not very clear in WordPress but that is a double dash before name shown below)

conda create –name deeplearning python

You can change deeplearning to whatever you’d like to call the environment. You’ll be prompted to install various dependencies throughout this process—just agree each time.

Let’s now enter this newly created virtual environment. Enter the following command

activate deeplearning

The command prompt should now be flanked by the name of the environment in parentheses—this indicates you’re inside the new environment.

We know need to install in this new environment any libraries we may need as they wont be accessible from the original root folder created when Anaconda was installed.

IPython and Jupyter are a must for those who rely on Jupyter notebooks for data science. Enter the following commands

conda install ipython
conda install jupyter

Pandas includes the de facto library for exploratory analysis and data wrangling in Python. Enter the following command

conda install pandas

SciPy is an exhaustive package for scientific computing, but the namesake library itself is a dependency for Keras. Enter the following

conda install scipy

Seaborn is a high-level visualization library. Enter the following

conda install seaborn

Scikit-learn contains the go-to library for machine learning tasks in Python outside of neural networks.

conda install scikit-learn

We’re finally equipped to install the deep learning libraries, TensorFlow and Keras. Neither library is officially available via a conda package (yet) so we’ll need to install them with pip. One more thing: this step installs TensorFlow with CPU support only and not GPU support. Enter the following

pip install –upgrade tensorflow
pip install –upgrade keras

Check all is OK

Get Jupyter Notebook up and running by entering

jupyter notebook

Once you are in notebook create a new notebook file and simply enter

from keras.models import Sequential
from keras.layers import Dense

Now run the above cell and hopefully all will be OK

Should you at any point wish to remove the new environment simply use the following command

conda remove –name deeplearning –all

That’s enough for now, if there is interest then we could perhaps explore the code sessions.

Archives

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Make Your Betting Pay
    • Join 50 other followers
    • Already have a WordPress.com account? Log in now.
    • Make Your Betting Pay
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar