I have spent the last few days working on a Random Forest version of my own flat handicap ratings. The original ratings are based on AE values or Actual divided by Expected values to give them their full name. Let me remind you of what AE values are. If we are say calculating the AE values of last time out winners, we can look at all lto winners and for each horse calculate its market chance by taking its SP or BFSP, stripping out the over round and then take the odds as its chance of winning. So an even money shot should win 0.5 times if the odds are true. We sum up all these win chances along with the actual win count for these horses. this gives an E (expected) value and an A (actual) value. If we divide the A value by the E value and it is greater than 1 then, in our example, last time out winners are winning more times than the market estimates. If the value is less than 1 then the market is over betting them.

I trained a Random Forest model on my data for 2009 to 2013 and then tested on the years 2014 and 2015. The original AE model produced the following results for top rated horses.

Bets 7918 Wins 1276 PL +305 to BFSP after comm’ ROI +3.2%

The Random Forest model produced the following results for top rated horses

Bets 7699 Wins 1164 PL +323 to BFSP after comm’ ROI +4.1%

The software used was Python with the Skicit Learn Random Forests library. See my intro blog entry on this software.

The initial interest in this area stems from an excellent article published by Stefan Lessman which is linked below

http://www.sciencedirect.com/science/article/pii/S0169207009002143

The next step for me is to extend the model by taking the Lessman and co’s example of moving to a second step of using the resulting RF ratings and combining with the market price of each horse using regression to eventually produce an oddsline. Of course BFSP is not known until after the off but final prices can be a good estimation. Lessman and Bentner argue this two step separation of fundamental race parameters and odds to stop the odds swamping the model parameters when used together at the same time.

I should also perhaps look at some more metrics on this model first as it may have not escaped your notice that the win rate on the AE model is greater than that of the RF model. Lessmann puts up some strong arguments for Random Forests in his article so if you are interested in race modelling it might be worth taking a look.