Tags

, , , , ,

First up a thank you to my friend Steve Tilley for pointing me towards PRIM, I must admit I had not heard of it until he tweeted me. PRIM stands for Patient Rule Induction Method, so what has it got to do with building race betting systems. If you have had a go at building a system you are familiar with being faced with an array of variables to choose from. Examples of these would be days since the horse last ran, is it a course and/or distance winner and so on. Picking the right combination is one part of the task but also picking the subset of each variable is an additional task. For example are you better off picking horses that ran between 10 an 15 days or is 15 to 30 the sweet spot.

The algorithm for PRIM revolves around ‘Peeling’ and ‘Pasting’. I am going to focus on Peeling which essentially involves shrinking the entire data set gradually. Each step of shrinkage involves removing a subset of rows of data starting with the most ‘worthy’ subset. Of course ‘worthy’ means different things to different applications but with the algorithm you can specify what value of worhtiness to rank the subsets on.

Let me be a bit more specific. Take days since last ran. I am going to have my test of worthiness coded as variable profit or loss, so in other words profit when attempting to gain £1 at the odds on each bet. The algorithm will now search the data space on days since last ran using small incremental steps (which we can specify) until it finds the optimum in terms of (in our case) profit. Let us imagine it finds the most profitable to be between 2 and 6 days, it will then remove this subset of data from the overall data and then repeat the process in order to find the second best fit and so on.

It is possible therefore to use PRIM to find the first best fit on a series of data variables of interest, which variables you may ask. I would suggest bearing in mind that consistency of distribution in terms of profit is important. A variable that has wild swings may be less easy to tolerate even if overall more profitable than a lesser variable that is consistent say across years.

I applied PRIM to the following variables from MySportsAI data for 2011 to 2016 for Handicaps

[‘daysLto’,’prevLto’,’TRSR’,’TRinrace’,’avgBeat’,’runnersRatio’,’pdsBtnLto’,’daySinceGR’,’Jockinrace’,’SireSurf’,’IPDropPercent’]

I then took the top located segment form the top 2 ranked based on the reported mean measure from PRIM (Note not sure at this stage what the means measure is but it go’s gradually down for each variable as it reports the top down to bottom located segments). Using this as a system applied to 2016/17 produced the following.

Just using the top ranked variable daysSinceGR (days since good run) produced

980 bets PL after comm +26.1 to BFSP

Now using the top two variable segments applied together produced

119 bets PL + 9.43

Note to readers – after some Twitter user baulked at the profit on one of my articles. Articles like this are more about informing the reader of a method or a piece of software, the PL is just a final line of information. It is intended to spark the readers interest not deliver on a plate a winning system. As I often say to people, reading Nick Mordin when he used to publish weekly was not really about finding a golden goose but more about being influenced by a way of thinking. The most valuable things I have learnt over the years have been about thinking and not a specific winning strategy.

You can find out more about PRIM from this article

https://towardsdatascience.com/find-unusual-segments-in-your-data-with-subgroup-discovery-2661a586e60c#

Please do not forget to rate the article and feedback is welcome in the comment