Imagine betting in the 1970’s and 80’s. If you cannot remember back to that time let me guide you there and specifically the world of horse betting. To make selections you would either buy the Sporting Life or pop down the local bookmakers and read the form off the bookmaker walls. This still happens today but anybody reading form off a bookmaker wall these days is seen as a dinosaur, the bottom of the betting food chain. Now imagine if you were back in the 70’s and the main source of analysis, namely the racing form pullouts, were only available to a select few and those that were not within this club had to rely on simply newspaper tipster selections or what they saw from non recorded TV coverage. What chance would this vast majority have of approaching break even let alone a profit?.
When racing data became available around the turn of century this split between the have’s and the have not’s gained another division. The video recording punters suddenly needed to evolve, data gathering and analysis skills were needed otherwise they were in danger of being left behind. We are now in an era of a new division of punters. Machine Learning has offered solutions to machines beating chess champions and more recently conquering the world of Go, a game far more complex in terms of permutations than Chess.
It is inevitable therefore that Machine Learning will play an increasing role in sports betting analysis but unlike previous evolutions in betting ML poses a steeper learning curve than learning how to control a video recorder or click some buttons to outrageously back fit a racing system. ML production requires some coding skills and even if you have coding skills you will need to adopt one of the main programming languages such as R or Python. For many this will be a major investment of time. With this in mind I set about creating a Graphical User Interface based software package that allows the user to create ML models on horse racing data. You need no coding skill to use it nor do you need in depth knowledge of Machine Learning although you most certainly will pick up aspects of this field though using it.
Perhaps I am getting ahead of myself here. some of you may be asking the question what exactly is Machine Learning. Let me compare it with system building which most people will be familiar with. Imagine we have just three input variables. The Jockey strike rate, the trainer strike rate and the horse sire strike rate. A system builder will try a multitude of combinations such as trainer strike rate greater than 12 coupled with sire strike rate grater than 10 along wiht jockey strike rate greater than 8. If the results this produces look profitable hey presto he has a system. There are lots of potential danger with this approach. First of all is it the optimum balance?. Has he constructed it on some past data and then tested it on some new fresh data to see how it works?. Finally even if its robust it will only produce single bets in a race and in many races no bets.
Let me contrast that with an ML approach using one of the most simplest ML algorithms to conceptualize, the K nearest Neighbor. The ML approach with this data would be to typically split it into 80% and 20% partitions and then train the model on the 80% and then test how the model performed on the 20%. Training involves creating a model based on searching for the K (lets say 9, you can set this number) nearest or similar patterns in the data to the pattern it is trying to predict. If it is trying to predict the results of a horse with a TRS = 14%, JOSR = 10% and SISR = 7% it will search the data space for the 9 examples that are closes to matching this pattern and then look at whether they won or lost. It will then use the 9 results in a vote to create a probability eg if 3 won then the probability would be 3/9 or 0.33%. This means that for all races going forward that you try to predict you will have probabilities for each horse in every race giving you a rank order in the race ie a rating.
KNN is just one ML algorithm available to use and in my push button software you can create a model and run it against todays racing and produce your own ratings. If you remember RSB software from around 2000 then think similar but more powerful.
If you are interested in the software and want to see a sample of its development check out the following links