I have been playing around with a Python library called PyGad. It is a Genetic Algorithm (GA) library that enables you from within the Python Programming language to create a Genetic Algorithm approach to Machine Learning. I have mentioned before how useful Gradient Boosting (GB) is for racing data due to its ability to handle in balanced data (ie far more losers in the data set than winners) plus they handle data that has not been normalized in any way quite well. However one disadvantage is that when you train a model using GB or any other Machine Learning algorithm you are essentially training the model to find winners rather than profit. To illustrate what exactly I mean by this, imagine a ridiculous example where we doubled the price of every horse above 20/1 and then also imagine we included the Betfair SP price in the data set we are training on. You would like to think that the model would spot that longer priced horses are the route to profit but it won’t simply because they still win less than 10/1 shots which win less than 8/1 shots which win les then 6/1 shots etc. The algoirthm will latch onto the fact that nothing predicts winners better than BFSP and it will focus on BFSP. This is why you should never include BFSP as an input feature.
One way around this is to create a custom loss function which forces the algorithm to train for profit rather than winners. This is easy to do in Python and PyGad so I set about investigating whether a GA trained to find profit, where profit was calculated to variable stakes, would outperforms a GB which is trained to find winners.
I trained both approaches on data from 2011 to 2017. The data was the data I use for my model submission to the Wisdom of models. The data for the GA model was normalized, instinct told me this would be the better option although I should test it without as well. The test data was 2018 to 2019.
The GB model made a ROI% to variable stakes on top rated horses of 2.68% after commision
The GA model made a ROI% of 0.30%
To my surprise the GB model solidly outperformed the GA model even though the GA model was trained to produce profit.
Overview of Gentic Algorithms