Stef the original creator of Smartsig produced a set of ratings using a neural network. The ratings were based on the finishing positions in the last 3 runs of each horse along with the days since last run. This data was fed into an individual Neural Network for NH hurdles, NH chases, AW flat races and AW turf races. A typical line of data would look something (I guess) like

5 1 3 76 0

Showing that a horse had come 5 (all places above 4th represented as 5) in its third last race. 1st in its second last race. Third in its last race. Ran 76 days ago and in this coming race was not a winner.

These ratings to my mind were not intended as point and fire set of ratings but more as an illustration of how AI can be used and perhaps even as a starting point for further study either using traditional form study or AI methods. They have been published daily with Stef’s permission on the web site.

I thought it would perhaps be time to play around with them a little further and perhaps attach some performance figures to them. I was wondering if the above representation was indeed the best configuration. I chose to use a Random Forest as an Machine Learning vehicle simply because scikit-learn and Python do not have a readily available NN module.

The first thing I did was create a file for AW handicaps based pretty much on Stef’s layout of placings being 1 to 5 where 5 means anything outside the first 4. Days since last run were left as is. It is inevitable that some horses will not have 3 runs and in these cases I opted within Python to replace the values with the mean of the whole column. So a missing third run would be replaced with the mean for all third runs of all horses in the set. This is needed as Python and Scikit Learn do not allow missing values unlike the package R.

The next step was to train the forest on 2011 to 2013 data. Once this was done I tested the model on 2014 to mid 2016. I was hoping perhaps that to BFSP top rated horses might get close to break even as I recall that the original AI ratings top rated lose about 8 or 9% to bookie SP. I was pleasantly surprised to find the following

Top rated bets 4103 PL after 5% comm +215.3 ROI 5.24%

The bottom rated horses produced

7384 bets PL -927 ROI -12.55%

Encouraged by this I went on to try a modification to the placings data using the position of a horse in a race as a percentage of the runners in the race. So first of 2 would be 0.5 whilst first of 10 would 0.1. Placings were not cut of after 4 so for example 5th of 10 would be 0.5. I was hoping that this extra information would produce better results but as is often the case in this game more can mean less.

Toprated bets 4254 PL +5.7 ROI 0.13%

Finally I tried a hybrid of the above two methods. Placings 1,2,3 and 4th would be expressed as a percentage of total runners in a race whilst 5th plus would be represented as 1. This produced the following results

Toprated bets 4118 PL +68.5 ROI 1.66%

If there is interest in these ratings via the comments below I would be happy to produce them alongside the AI ratings and maybe extend them into other codes of racing. Any feedback below is most welcome.