The classical way of looking at trainer form is to check how well a trainer has done over the last X runs. Often this amounts to simply looking at win to run ratio but sometimes this is refined further to perhaps ‘good’ runs or placed runs. But is there a more refined way of looking at trainer form. What I am getting at is how well does M Botti do when running in a class 5 handicap with a 3yo who has been off 71 days. What if we add further criteria, perhaps stipulate that the animal is a male rather than female horse. There are all kinds of criteria we could come up with but it gets messy and even if you do not think so the question remains how do we evaluate his runs with such animals?.

Machine Learning can help in this situation. The K Nearest Neighbor is one of the simplest algorithms to understand. Imagine we simply focus on Botti’s runners that are 3yos or as near as possible to 3yo and off 71 days or as near as possible to 71 days. It would be great if Botti had a multitude of such previous runners but of course he wont but KNN will search for the nearest sample of data to these values. The sample is set by us when we run a KNN program. I preformed this task on the last race at Wolves on Saturday 26th November 2022. I trained the KNN algorithm of Bottis data from 2011 to 2019 for class 5 and 6 races and then ran a prediction on Botti’s runner in the last at Wolves. Now normally the algorithm would predict the chances of Botti having a winner with a 71 day off 3yo. However I wanted to refine the prediction somewhat. I actually accessed the 21 nearest neighbors from 2011 to 2019 (I specified it should look for 21 nearest instances) and then instead of lengths beaten for each animal I looked at pounds beaten and compiled an average. I did this for all trainers in the last race at Wolves and then ranked them with of course the smallest average being the best ranked in the race. I also graphed the individual trainers nearest neighbors, here are a couple

At first glance Appleby’s graph looks better but of course the vertical scale is different although he does have a large outlier.

There is lots more work that can be done on this idea. Certainly the two inputs above should be normalised to lie between 0 and 1 otherwise the algorithm will give more weight to a difference of say 10 days in days since last run than perhaps a difference in one or two age years. This would lead to days since last run dominating the selection of the 21 nearest neighbors.

Does this approach have any legs, well I trained on 2011 to 2015 and tested on 2016/17 for all handicap races using just a couple of input fields for trainers of which days since last run was one and during 2016/17 the top ranked made variable stake profit of +19.1 points whilst the second ranked made +15.6

In this race Botti is top ranked and Appleby is second top, good luck with whatever you are on

Comments are welcome and dont forget to rate the article