The new kid on the block in the world of machine learning is deep learning but the among the shallow learners the method kicking ass in the world of Kaggle is Gradient Descent Boosting, and I have to admit I am a convert.

Up until now I have not quite seen the gains from ML techniques over non ML methods of forming ratings such as AE values. The old adage that data is king and the method will never polish a turd to such an extent that it will shine has never been in doubt to me. Gains form ML techniques have seemed minimal but that may be about to change in my world.

I fed a set of data into a Sklearn Random Forest machine learning algorithm. If you are not familiar with Random Forests they are basically a tree based algorithm but their strength comes from the fact that they form multiple trees on subsections of the input data. Branches are also split on random sub sets of the available fields within the data. This means that a single more dominant feature or field within the data is less likely to swamp the decision making of the tree’s and hence bias the overall model towards one field within the data fields.

Working with BFSP as a model evaluator I checked first the performance of top rated horses using the model on fresh unseen data. This produced after commision

8031 bets PL -33.6 Points ROI -0.41% VarPL -3.96 VarRoi -0.31%

Now creating a Gradient Descent model and applying it to the same data produced

9250 bets (more joint tops) PL +713.7pts ROI +7.7% VarPL +100.5 VarRoi +5.54%

Splitting at roughly the mid point of rating values to split the runners roughly in half produced for the GDB method

33536 bets PL +861 ROI +2.56 %

By contrast for the Random Forest it produced

33155 bets PL -784 pts ROI -2.36%

How does Boosting differ from plain old Random Forests. Random Forests rely on a technique called bagging. A selection of the input data is placed in a ‘bag’ and the random forest algorithm works on this bag or subset of data. Another bag is then selected and a Random Forest algorithm works on this data and so on. The results are then averaged across all the trees to produce a final prediction. With Boosting however an extra step takes place. When the first bagged set of data is analysed, weights are handed to the data ready for selection for the second bag of data. Those data items that were predicted poorly in the first bag are prioritized for inclusion in the second bag in the hope that within this new mix they will be predicted more efficiently. Kind of like you doing a random set of revision questions from a set of questions and then when I select a second random set for you to try I increase the chance of selecting the ones you got wrong in the first revision test.

The following short video does a pretty good job of describing the process. Needless to say the above results have sparked my interest