Tags
Over this weekend I have been playing around with a Machine Learning algorithm called LightGBM, produced by the teams at Microsoft. This is an algorithm from the Gradient Boosting family. I have included GBM in the MyPportsAI package but not LightGBM mainly because a cursory read about it suggested its main advantage was speed of execution. Well I was not too fussed about this aspect so I jumped to the conclusion that Light perhaps meant light in predictive performance. I may have been wrong. The main reason for me picking it up this weekend is that LightGBM allows you to specify your own custom loss function. Allow me to explain. When a algorithm be it GBM or Logistic Regression is trying to produce the best model it can for future predictions it is doing this by examining the data you have handed it to train itself on. It has to try different scenarios just as you would if you were playing the party game ‘guess who I am’. For example am I a male Y/N, am I European Y/N and so on. Each time it constructs a model space (think of that as a completed game) it needs some mechanism for evaluating the worthiness of the model. In fact it also needs some measure of worth to evaluate each stage of construction. I am not going to dissect the detail of this here but usually its some measure of accuracy. In horse racing terms this means is it finding winners better than the last model it trained. Is this split that it creates in the data better than other possible splits in terms of dividing winners from losers. Well we all know as bettors that this is not the complete picture. Profit is what we are seeking and often that comes hand in hand with fewer winners, I mean go ahead and back all odds on shots and you will have lots of winners but no profit. Creating your own custom loss function allows you to stop minimizing losers and start minimizing losses. The algo’ will use your loss function rather than one of the in built loss functions that focus on winners.
So how dis LightGBM perform, well I used three sets of data each consisting of just 3 input features. I then tested them all using standard GBM, LightGBM and what I will call here LightGBM+. The plus means I plugged in my custom loss function. All three used just default hyper parameters and the data was for the flat from Handicaps 2011 to 2017. I used train test split and then checked the top rated horse from the different scenarios to BFSP minus 2% commission. Here are the results

As you can see on the first set of data LightGBM came in at ROI% of just under 2% whereas LightGBM+ achieved over 6% and GBM trailed in at nearly -2%
This looks like a promising area of further research and I will look to place LightGBM into MySportsAI along with a custom loss alternative.
FootNote 1 – The variable stake return on that over 6% run was + 4.78%
FootNote 2 – Training on the whole 2011 to 2017 and testing on 2018/19 showed little difference betwenn LightGBM and LightGBM+ but both were around 1% better ROI to variable stakes than GBM
FootNote 3 – The ROI was not over 6% for the LightGBM+ on the first data but actually +5.01%, I had omitted to count joint top rateds. Still however well above the other models. Also the Variable stake return was on this first data set for LightGBM+ was +3.29%