The First Rule of Betting

Stumbled on a tweet the other day asking

“What is your best advice to a first-time bettor?”

There was a lot of good responses such as manage your bankroll, bet on your most strongest opinions, do not chase losses and so on some were far more esoteric. Nothing wrong with these suggestions but I wondered, are they really the lessons that will get you to first base?. As for back your strongest opinions well if you think a novice punter can figure that out then please explain in the comment section below.

My response was that the first fundamental mind shift that a punter has to make in order to get them to first base. Something that will at least mean they are are on firm ground when looking to get to second base and one paradigm shift that hopefully with some examples and suggestions can be made by everyone, is to understand that its all about the price.

What do I mean by this?, well its the basic idea that unless you are backing selections that are over priced ie bigger than their true chance of winning then you are not going to make money in the long run. Its as simple as that, really. Just about everybody will agree with that but the vast majority have trouble implementing this approach. To bet on something that you think is too big when you do not think its the most likely winner is a difficult thing get your head around and action. In fact its worse than that as the vast majority do not even think in terms of chance they stick to the simple idea of what is going to win and then they decide to bet on it if the payoff would titillate them or not. At even money it hardly seems worth it but at 3/1 I could buy a new suit.

Value the word I have avoided using, a word that even some pundits on TV will fail to understand. How does one evaluate value, how do you arrive at value bets. There are two main approaches.

The first approach is the more nuanced and requires a certain skill set. It involves calculating the chance of say each horse in a race and then comparing that chance to the chance the exchanges are offering and bet or no bet depending on whether you think the selection has a better chance than the odds indicate. Calculating these chances or probabilities can be done via a couple of methods. The first is seat of the pants experience based on form reading and knowledge of individual horses. This takes time and experience and puts a lot of emphasis on your own strength of mindset after all when things are going poorly you somehow have to operate in the same consistent manner. This is very difficult do but there are people who can simply spot over price selections. The disadvantage to this method is that you probably generate fewer bets which can be a problem for reasons outside the scope of this blog post.

The second method within this first method is to create statistical models based on input data in say horse racing if that is your field of betting. This model can then generate probabilities for each horse in a race. The benefit of this approach is that it can be automated and can produce more turnover in bets.

The second method which can produce value is to not attempt to calculate individual odds on horses but to try and find categories of horses that are overpriced. For example you may find that Richard Fahey’s horse when ridden by a claimer are over priced and long term return a profit. You do not know which ones are over priced you just know they are as a collective. Of course this can change as can the relative weights or merit of input values to a model. Also with a model you do not know which horses probabilities were correct and which were wrong you just know they work as a collective. The key difference thought is that with the Fahey runners you are backing them regardless of price whereas with a model you will skip some horses because of the price.

Now I am not suggesting that the last few paragraphs should be thrown at a novice bettor but its abolutely fundamental to betting that a new punter gets their head around the value concept. Pretty much all else is lost if they do not grasp this and base their betting upon it

If you found any value in this blog post let me know via the rating below and comment section

Pre Cheltenham Going

With Cheltenham only a couple of weeks away some concern was expressed on the MySportsAI email forum about the extreme prevalence of soft and heavy going this NH season and how this may cause upsets at Cheltenham when we are likely to hit better going. Could all those fancied horses who have done well over the winter on soft going come unstuck at Cheltenham when better ground is likely. If this is the case then bookmakers could be in for a good week.

To examine this I looked back at the last 10 seasons and did a simple calculation. Using a numeric value for the going where Heavy = 6 and Firm = 1 I calculated an average going for the months November to February in each year. I then examined the fate of the market in that years Cheltenham festival. Perhaps higher average pre Cheltenham going would mean more market upsets.

Here are the results for all Betfair SP prices and laying to win £1

Year 2010 pre going 4.62 LayVarPL +2.174 average win price 20.84

Year 2011 pre going 4.43 LayVarPL -0.61 average win price 15.2

Year 2012 pre going 4.14 LayVarPL -0.02 average win price 14.46

Year 2013 pre going 5.03 LayVarPL -1.54 average win price 15.98

Year 2014 pre going 4.8 LayVarPL +0.06 average win price 17.27

Year 2015 pre going 4.78 LayVarPL -0.42 average win price 13

Year 2016 pre going 5.01 LayVarPL +2.17 average win price 10.7

Year 2017 pre going 3.05 LayVarPL -0.61 average win price 17.1

Year 2018 pre going 4.56 LayVarPL -0.12 average win price 15.12

Year 2019 pre going 3.68 LayVarPL -0.35 average win price 24.99

As we enter 2021 Cheltenham the pre going value is 4.54 not as generally soft as some other years.

The above figures do not seem to spell a pending doom for punters at the front of the market, indeed the best year was 2016 which came off a seemingly wet 5.01 winter. As I have always said, in general terms does the going really matter, well not really.

Cheltenham Wisdom of Crowds

A quick fun post on this glorious Sunday morning. The spring weather is making an early appearance and no sane person should be sat in front of a laptop. The Wisdom of Crowds approach to betting centers around finding horses that are x percent above theirt average price with a bookmaker or two and working off the idea that concensus is right and the out of line books are wrong a profit can be made backing those out of line bets.

Cheltenham is only just over 2 weeks away so applying this principle to ante post odds I wondered whether it could yield some ante post profit. Here is a list of horses that are at least 1.4 times bigger than the average price with a bookmaker or two (no exchanges used to inflate the averages). I have also restricted the number by sticking to the first 10 in the betting as this will hopefully reduce the NRs who are simply big prices because they are not running. No guarantee of this but one hopes.

OK here is the list showing the maximum price avaialable, I will leave it to you to check the bookmakers but they are all available odds shown here as decimal ie 9/4 is 3.25. I will check them after Cheltenham concludes

supreme-novices-hurdle Bob Olinger 21
arkle-chase Fusil Raffles 41
arkle-chase Blackbow 51
arkle-chase Felix Desjy 51
arkle-chase Sky Pirate 51
ultima-handicap-steeple-chase Escaria Ten 11
champion-hurdle Aspire Tower 21
champion-hurdle Concertista 26
mares-hurdle Honeysuckle 5
boodles-juvenile-handicap-hurdle Riviere Detel 15
national-hunt-chase Royale Pagaille 6.5
national-hunt-chase Secret Reprieve 17
national-hunt-chase Dickie Diver 21
national-hunt-chase Pencilfulloflead 21
ballymore-novices-hurdle Appreciate It 9
ballymore-novices-hurdle Ballyadam 21
brown-advisory-novices-chase Royale Pagaille 9
brown-advisory-novices-chase Latest Exhibition 15
brown-advisory-novices-chase Colreevy 34
cross-country-chase Shady Operator 16.2
champion-bumper Eileendover 29
champion-bumper Good Risk At All 30
champion-bumper Letsbeclearaboutit 42
champion-bumper Balco Coastal 124
marsh-novices-chase Energumene 9
marsh-novices-chase Monkfish 11
ryanair-chase Chacun Pour Soi 17
ryanair-chase Kemboy 17
stayers-hurdle Roksana 15
stayers-hurdle McFabulous 21
stayers-hurdle Kemboy 21
stayers-hurdle Champ 21
stayers-hurdle The Storyteller 21
paddy-power-plate Chatham Street Lad 21
mares-novice-hurdle Gauloise 15.5
mares-novice-hurdle Mighty Blue 32
county-hurdle Blue Lord 15
hunters-chase Sametegal 23
hunters-chase Shantou Flyer 26
hunters-chase The Worlds End 24
mares-chase Dame De Compagnie 26

Horse Stride Length as a Predictive Measure

Horse stride length and horse stride rate ie strides per second are two relative newcomers in the form analysis tool kit. Simon Rowlands has documented numerous times that strides per second remains fairly constant regardless of going but stride length varies according to going. If we accept that a horse’s ability to win a race rests on the equation of stride length x stride rate plus of course how long it can maintain that at or near its maximum then I am left wondering whether using stride length as a predictor of going preference might be a useful tool to assist our betting. My line of thinking here, and I admit this is early days, is that if all horses stride lengths shortens by x from good going to soft going but a horse with little exposed form runs on both these grounds and his stride shortens significantly less than the expected average despite running poorly on the soft ground (perhaps outlcassed or unfit), we could have a horse labelled as a soft ground disliker when in fact he is completely happy on it.

Let me also say up front that I have previously stated that going does not matter but when pressed on this I have expanded my opinion to state that overall when modelling it has limited benefit, probably because the market takes care of it. However that is not to say that in individual cases good bets cannot be made because of ground. I have had a few myself in the past but when modelling we are looking at overall contribution to a model and personally I have not found much value in the going.

Let us take a look at some stride averages (all horses 2017 to 2020) based on race distance and going. I have not accounted for class yet as I am simply wanting to endorse others opinion on stride length.

5f Firm 23.97

5f GF 24.73

5f Gd 24.62

5f GS 24.24

5f Sft 24.06

5f Hvy 23.08

5f St 24.29

5f St/Sl 23.98

The above pattern is replicated across all distances, the slower the ground the shorter the stride length. We can see that over 5f the drop from Gd to Sft ground represents a 2.27% drop in stride length. Now what if our horse Mudlark makes his first attempt at soft ground having run over Good ground and happens to run poorly but logs a stride length percentage change of say 1.5%. Could these horses have hidden preferences that the public is not only unaware of but also betting in the opposite direction when Mudlark next runs on soft ground.

This is an interesting area of research and your thoughts at this stage will be welcome in the comments below

Generalized Additive Models

Some prelim’ thoughts on GAM’s which I have been playing with this afternoon. GAM stands for Generalized Additive Models. In a sentence or two, what are they ?

Well think of a linear relationship like qualifications and salary, as qualifications go up so does salary. Lets assume its pretty much a straight line and so linear regression would do a good job of capturing the relationship.

But what if we have something that go’s up in a nice straightish line but then tapers off, like age and salary. As we get older we generally earn more but as we get past 65 ish we tend to earn less. Linear regression will fit a straight line through this as best it can but it wont do the best job of capturing the tapering off. we would like a fitted line that captures the tapering off. as well as the straight line part. Also we want something that can do all this and slot together these lines when we have more than one input feature, for example age and qualifications together predicting salary

This is what GAM’s do and like regression they discover coefficients (weights) for the individual features so unlike black box algo’s you can look at the result and say trainer strike rate contributes y to the prediction whilst jockey strike rate contributes z.

I have been playing with pyGAM a GAM module and initially got some interesting results.
Using a walkforward train test (train on 2011 test on 2012, train on 2011 and 2012 test on 2013 and so on up to 2015) and using a LogisticGAM on a simple model ie 3 input features running under MySpportsAI I got the following results for the top 3 rated all to Betfair SP after 2% commision

The average variable ROI comes out around 0.77% for the top 3 rated

Now compare this with plain old Logistic Regression

LogGAM offers quite an improvement but it still does not manage to match Gradient Boosting.

Both Logistic and Linear GAM will be available in the March release of MySportsAI

You can find out more about MySportsAI at

Web Scraping With Selenium 4

After the last session we learned how to login using Selenium to populate a form and submit the form. The next step in my task is to fill in this upload page so that I have a program that will do all this without me having to even visit the web manually so to speak.

We have two items to populate here, csv file name to upload from my drive and the upload password. The value of these items were read in from my txt file in first part of the code. I also want to make sure that if more than one file is specified for uploading then it will handle multiple files.

Here is the final piece of code I need to add to my program

The first line loops through each entry of the list we populated from smartpass.txt but it starts from line 3 (ie the 4th line as the first line is 0). If there is only one file to upload then this loop will only iterate once. The body of the loop chops off the carriage return from the file name and then loads up the page shown above. The rest of the code is pretty much the same as before after of course we have manually searched the html source for the page and identified the input box names.

We have touched on some ideas behind accessing forms and logging into pages using Selenium. Hopefully you found this useful, let me know via the ratings below.

Web Scraping With Selenium 3

In this session we are going to write the code to login to the SmarterSig web site. To do this we will need to know the names of the user id and password box. If we right click on the home page and select ‘view page source’ we should be able to search for ‘login’ and find the following section of HTML code

The names of the two input boxes are un surprisingly userid and password

With this information we can now get selenium to load up this web page in a browser on our screen and populate the two boxes with our userid and password and then submit them. Here is the total program to date

import selenium
from selenium import webdriver
import time

uploadData = []

# input the access data

f = open(‘smartpass.txt’, ‘r’)
uploadData = f.readlines()
user = uploadData[0]
passw = uploadData[1]

womPass = uploadData[2]

# trim carriage returns
user = user[:-1]
passw = passw[:-1]

womPass = womPass[:-1]

PATH = “C:\Program Files (x86)\chromedriver.exe”
driver = webdriver.Chrome(PATH)

driver.get(‘ ‘)

# locate and fill userid and password input boxes

idBox = driver.find_element_by_name(‘userid’)

passBox = driver.find_element_by_name(‘password’)


# submit the form


# pause and then quit the browser

time.sleep (5)

The final couple of lines are simply to allow me to see the logged in screen before quiting the browser. There are a number of ways to submit the userid and password form. Submitting one of the fields, in my case the password field, is just one of them.

In the next section we will look at the next page we need to access now we are logged in and how to upload a file.

Web Scraping With Selenium 2

We are going to access some web pages that require form input on the SmarterSig web site. The first page will be the home page where we need to login

The second page we will be interacting with once we have successfully logged in will be the following

Clearly we need the code to have access to the following pieces of data in order to carry this out

Your SmarterSig userid (login)

Your SmarterSig password

The password for uploading

The full path name to the location on your machine of the file you wish to upload

We create the above in a file I call smartpass.txt with each line containing one of the above.

Now we are are ready to code the program

First few lines of our program

import selenium
from selenium import webdriver

uploadData = []

f = open(‘smartpass.txt’, ‘r’)
uploadData = f.readlines()
user = uploadData[0]
passw = uploadData[1]

womPass = uploadData[2]
user = user[:-1]
passw = passw[:-1]

womPass = womPass[:-1]

The first two lines import the libraries we will need

The third line declares a list to hold the lines of information in the smartpass.txt file once we have read them in.

The 4th and 5th lines open the file for reading and read all the lines into the list so we can access them in the program.

The remaining lines take care of two things. First placing the data held in the list into individual variables. I prefer to do this so that each element has a more meaningful name. Plus each data item then needs the last character removing because this is a carriage return character.

The business of handling the file name we want to upload we will come to later but bare in mind that we want the code to handle multiple files (ie multiple lines in the smartpass.txt file) should you want to upload more than one file.

OK we will dig a bit deeper in the next session

Web Scraping With Selenium

I have posted previously on web scraping data for betting purposes but I have not gone into how we can access pages that require form filling, a typical example might be logging into a site first before you can access data or maybe submitting information via a form. This is the first a in short series on how to use Python and the Selenium library to carry out these functions.

First some prep work for you. Selenium works by literally, under control of the program you write, popping a web browser up on your screen and then filling in form boxes and submitting the form in order to refresh the page with perhaps another new web page. When your program is running it will feel like someone else is controlling your computer, but do not worry, its you or should I say the program you are about to write.

First things first though. I will be demonstrating code that will be manipulating a Chrome browser, more specifically Version 87.0.4280.141 (Official Build) (64-bit).

You might want to download Chrome if you have not already got it installed. You can change your code to handle other browsers but in the blog I will be handling Chrome.

The next thing you need to do is download the Chromedriver.exe utility. You can download it from

Make sure you get the version for your Chrome web browser. If you are not sure what version of the Chrome browser you are using then click the three dots in the top right corner of your Chrome browser and then select Help followed by about Google Chrome.

Once you have downloaded the chomedriver.exe save it in your (assuming you are using windows) folder

C:\Program Files (x86)

So you now have

C:\Program Files (x86)\chromedriver.exe

Final step of this set up procedure is to install Selenium. I am assuming you already have Python installed. To install selenium type at the windows msdos command prompt

pip install selenium

OK thats the end of the first session, any problems or comments please comment below. In the next session we will start on some web browsing under program control.

Genetic Algorithm V Gradient Boosting

I have been playing around with a Python library called PyGad. It is a Genetic Algorithm (GA) library that enables you from within the Python Programming language to create a Genetic Algorithm approach to Machine Learning. I have mentioned before how useful Gradient Boosting (GB) is for racing data due to its ability to handle in balanced data (ie far more losers in the data set than winners) plus they handle data that has not been normalized in any way quite well. However one disadvantage is that when you train a model using GB or any other Machine Learning algorithm you are essentially training the model to find winners rather than profit. To illustrate what exactly I mean by this, imagine a ridiculous example where we doubled the price of every horse above 20/1 and then also imagine we included the Betfair SP price in the data set we are training on. You would like to think that the model would spot that longer priced horses are the route to profit but it won’t simply because they still win less than 10/1 shots which win less than 8/1 shots which win les then 6/1 shots etc. The algoirthm will latch onto the fact that nothing predicts winners better than BFSP and it will focus on BFSP. This is why you should never include BFSP as an input feature.

One way around this is to create a custom loss function which forces the algorithm to train for profit rather than winners. This is easy to do in Python and PyGad so I set about investigating whether a GA trained to find profit, where profit was calculated to variable stakes, would outperforms a GB which is trained to find winners.

I trained both approaches on data from 2011 to 2017. The data was the data I use for my model submission to the Wisdom of models. The data for the GA model was normalized, instinct told me this would be the better option although I should test it without as well. The test data was 2018 to 2019.


The GB model made a ROI% to variable stakes on top rated horses of 2.68% after commision

The GA model made a ROI% of 0.30%

To my surprise the GB model solidly outperformed the GA model even though the GA model was trained to produce profit.

Overview of Gentic Algorithms