The Wisdom of Models

The wisdom of crowds has been applied to many avenues of prediction. Stock markets, Oscar night, Elections and of course sports betting predictions. The general idea is that if you combine a collection of peoples predictions which may individually be average in quality, you can find that aggregating their predictions in some way can produce a set of final predictions that are better than the individual predictions. This was first observed in a fair ground where the public were asked to guess the weight of an Oxe. One participant found that the average of everyone’s guesses won him the prize. Not sure if he took home an Oxe but you get the idea.
A similar approach appears in the literature of Machine Learning, it go’s by the title of Ensemble modelling. The idea is the same in that the predictions of several models are somehow combined to produce a single set of predictions
As with human predictions, diversity is the key. It works best when the people or models are coming at the problem from very different viewpoints. For example in a horse racing context perhaps one model has race times as the core of its modelling whilst another is more class based. There is certainly evidence within the Machine Learning world that ensemble modelling can outperform single models and of course the ensemble can extend to different algorithms rather than just different model inputs. You could create an ensemble based on a Tree based algorithm like Gradient Boosting along with a regression algorithm and a Neural network. They may all have the same inputs but different approaches to creating the model.
We are running an experiment at MySportsAI at the moment. Some members are putting forward their model ratings each day and using a simple ranking aggregation and tracking the top 3 ranked we are checking how the wisdom of models performs.
So far in handicaps for 3yo and 3yo+ the collaboration has produced to BFSP after commission

206 Bets PL + 19.1 pts ROI +9.27

Early days but an interesting start.

The other interesting facets of this approach is that first of all no ones proprietary models are compromised, the inner workings of ones model is kept undisclosed. The second benefit to this collaboration is mutual support. We all know that betting can be a lonely business. Crowd betting offers buddy support something we all need when results are going poorly.

You can join MySportsAi at

Derby 2020

I never like to read journalists slagging of the public for taking an interest in the sport that fosters them with a comfortable living. For sure those on twitter can over react equally as much as most journalists under react when faced with a topic that may threaten their job prospects or their ability to get a chummy weighing room interview. Nicholas Godfrey penned such a piece on this years Derby, where he opens up with a few snide remarks about social media, something he clearly feels is beneath him. Luckily Nicholas looks like he is approaching retirement which is just as well because that tacky social media he refers to is probably going to replace him and its certainly the place where with a bit of pruning the more intelligent analysis is taking place or at least the synopsis is.
I would not have objected if Nicholas had offered any original insight into the race but sectionals are clearly a bit too technical for him along with social media so I will try and add a bit of extra analysis here, above and beyond what has already been said.
I too had looked at the Oaks and the Derby in terms of sectionals and posted on Twitter that Love would have mowed down Serpentine race for race and I posted this 20 mins after the Derby. One of those knee jerk reactions that Nicholas is so dismissive of. Later however I decided to look for a Derby that was run in a similar overall time and most importantly on the same official going. The race I honed in on was Authorized. Taking timings to the path entering the straight, Authorized hit the road in 1:53.58 whilst the the most prominent horse with any chance of winning according to the betting in 2020 hit the road in 1:56.22. That is a huge difference and one that is quite probably impossible to overcome. It is fair to say that Serpentines jockey got the fractions right but 15 other top class jockeys seemingly had no idea whether the pace they were setting was correct or not. A road that most journalists will never hit.
I have not done the middle section timings but my visual guess is that the race was lost in the middle section where it appeared to slow. Any middle or long distance runner will tell you not to make ground in the teeth of a race, unless they have gone ridiculously fast, rather to gain ground in the cheap or slow section. Nobody bothered to do this and hence the race was lost. I may be wrong on this last part, only the times will tell.
The big question now is will you back or lay Serpentine when he next runs in a G1 or will you join Nicholas on the fence?. Let me know in the comments below.

Stacking Ensembles for Horse racing

Imagine you had x mates, all experts in a certain field of betting on horse racing. One was an expert on breeding, the other was very knowlegable about draw bias, a third shit hot on trainer jockey combos, I could go on and the topic of expertise does not really matter. The main point is how would you want to synergise their opinions into a race selection. You could put them in a room together and let them debate a selection in the 2.30 at Sandown. The trouble with this is that the value of each may get drowned in the noise of the collective. The optimum way of combining these varied inputs may get lost in the futile attempt to combine them in one fell swoop so to speak.

In the Machine Learning world there is a technique called Ensemble stacking. This is slightly different to the above scenario. With ensemble stacking different ML algorithms are trained on some data and then they make predictions which are then fed into a second stage who’s job is to find out how to combine the predictions to give a super prediction. Going forward this can often result in better predictions especially if the algorithms used are different in nature and therefore discovering slightly different things about the data.
Sound familiar?, well this approach can be used for ML models for horse racing. Instead of throwing the kitchen sink at a model build perhaps results would improve if various models were constructed on tightly related sub fractions of the data. These predictions could then be fed into a second layer predictor that combines them into one final prediction. This also sounds like a close cousin of the two step process I covered in an earlier approach. Unless you are like me (could pick an argument in an empty room) then this may certainly be an approach worth exploring.

If you are interested in exploring Machine Learning for producing your own ratings but do not have any programming skills, don’t worry. I have produced some click and go software for developing ML models for sport. Check out the following

Two Step Models for Horse Racing

A machine learning algorithm like a Neural Network takes a set of data along with a target feature and attempts to find relationships between the inputs and the target. So for example let’s look at a real life case like the Titanic data set. Here we may have the following inputs in our data
Class of ticket, male/female, port embarked, age of person and so on
The output feature is did the passenger survive or die.
Feeding this data to a machine learning algorithm we are hoping that with a little help from us the algorithm can model the data and then on fresh Titanic data that we held back, make accurate predictions given the input data as to whether a passenger survived or not. With horse racing we are trying to predict whether a horse will win or lose and of course our input features will be very different.
The question often pondered is should we include the starting or Betfair starting price of the horse in the model input features, after all we are told the market carries a wealth of information some public and some not. The problem with including the price of a horse as one of the input features is that the SP is so good as a predictor of of chance of winning that the ML algorithm will ignore all your other inputs and blindly follow the SP as its main predictor. Well if life was that easy we would just go ahead and back all odds on shots. We can see from the output from MySportsAI that with three input features trainer strike rate, jockey strike rate and BFSP the feature importance plot at the bottom shows that BFSP has dwarfed jockey strike rate and trainer strike rate is barely visible.



So how can we utilize BFSP without it dominating the attention of our algorithm. One approach is to use a two step process. In the first step we train our model on the fundamental features, in the above example trainer strike rate and jockey strike rate. We will have done this on a quarter of our data. We then predict winning probabilities using this model 1 as I will call it, on the second quarter of data and combine these predictions with the BFSP from the second quarter. We now train a new model, lets call it model 2 on this second quarter data which contains predictions derived from model 1 and the BFSP. The BFSP in this step may have been massaged into natural log of the implied chance of the BFSP but lets not worry about that for now. We can now test our model on the third quarter having first created data from the third quarter by utilizing model 1 and combining with BFSP. After perhaps hyper parameter tuning this model 2 we can do a final test on the fourth quarter.

The idea behind this process is that the fundamental features eg trainer and jockey strike rate get a chance to be heard in the first model build before combining with BFSP in model 2. You will often find with this process that input features that were significant in model 1 are now not significant in model 2 simply because they have been accounted for in the BFSP by the betting public. In Sung’s paper on this subject the jockey lost significance but the draw remained significant.

I plan to implement a two step process automated facility into MySportsAI in the near future.

Machine Learning with MySportsAI

Imagine betting in the 1970’s and 80’s. If you cannot remember back to that time let me guide you there and specifically the world of horse betting. To make selections you would either buy the Sporting Life or pop down the local bookmakers and read the form off the bookmaker walls. This still happens today but anybody reading form off a bookmaker wall these days is seen as a dinosaur, the bottom of the betting food chain. Now imagine if you were back in the 70’s and the main source of analysis, namely the racing form pullouts, were only available to a select few and those that were not within this club had to rely on simply newspaper tipster selections or what they saw from non recorded TV coverage. What chance would this vast majority have of approaching break even let alone a profit?.
When racing data became available around the turn of century this split between the have’s and the have not’s gained another division. The video recording punters suddenly needed to evolve, data gathering and analysis skills were needed otherwise they were in danger of being left behind. We are now in an era of a new division of punters. Machine Learning has offered solutions to machines beating chess champions and more recently conquering the world of Go, a game far more complex in terms of permutations than Chess.
It is inevitable therefore that Machine Learning will play an increasing role in sports betting analysis but unlike previous evolutions in betting ML poses a steeper learning curve than learning how to control a video recorder or click some buttons to outrageously back fit a racing system. ML production requires some coding skills and even if you have coding skills you will need to adopt one of the main programming languages such as R or Python. For many this will be a major investment of time. With this in mind I set about creating a Graphical User Interface based software package that allows the user to create ML models on horse racing data. You need no coding skill to use it nor do you need in depth knowledge of Machine Learning although you most certainly will pick up aspects of this field though using it.
Perhaps I am getting ahead of myself here. some of you may be asking the question what exactly is Machine Learning. Let me compare it with system building which most people will be familiar with. Imagine we have just three input variables. The Jockey strike rate, the trainer strike rate and the horse sire strike rate. A system builder will try a multitude of combinations such as trainer strike rate greater than 12 coupled with sire strike rate grater than 10 along wiht jockey strike rate greater than 8. If the results this produces look profitable hey presto he has a system. There are lots of potential danger with this approach. First of all is it the optimum balance?. Has he constructed it on some past data and then tested it on some new fresh data to see how it works?. Finally even if its robust it will only produce single bets in a race and in many races no bets.
Let me contrast that with an ML approach using one of the most simplest ML algorithms to conceptualize, the K nearest Neighbor. The ML approach with this data would be to typically split it into 80% and 20% partitions and then train the model on the 80% and then test how the model performed on the 20%. Training involves creating a model based on searching for the K (lets say 9, you can set this number) nearest or similar patterns in the data to the pattern it is trying to predict. If it is trying to predict the results of a horse with a TRS = 14%, JOSR = 10% and SISR = 7% it will search the data space for the 9 examples that are closes to matching this pattern and then look at whether they won or lost. It will then use the 9 results in a vote to create a probability eg if 3 won then the probability would be 3/9 or 0.33%. This means that for all races going forward that you try to predict you will have probabilities for each horse in every race giving you a rank order in the race ie a rating.
KNN is just one ML algorithm available to use and in my push button software you can create a model and run it against todays racing and produce your own ratings. If you remember RSB software from around 2000 then think similar but more powerful.
If you are interested in the software and want to see a sample of its development check out the following links

Would Hugh Taylor Pay Premium Charge?

I am beginning to feel like a Hugh Taylor stalker but he is a convenient vehicle for examining one or two debates and one such debate popped up on my Twitter feed the other day. If you mention Betfair Premium Charge on Twitter you can bet the vast majority of Betfair users will not know what it is at least in detail, for the simple reason that it does not effect them and its never likely to. The premium charge is an extra charge on top of commission but you have to qualify for it under a very specific set of circumstances.

You need to have played in over 250 markets over the lifetime of your punting, not too difficult to do. You also need to have generated commission that is less than 20% of your gross profit over your lifetime. These are the main two criteria.

Why PC was introduced depends on your perspective. If you are a punter who backs and lays then you may take the view that it was introduced to penalise traders who make heavy use of the system, generate profits from reduced losing periods compared to punters and as a result remove liquidity from the market and because of the low commission they pay in relation to their profits you may feel that extra payment is called for. If on the other hand you are a trader paying PC you may feel that it is just a cynical Betfair grabbing extra income from where ever it can. Also do not forget that Betdaq do not charge PC and have equal if not lower commission rates.

When this topic comes up on Twitter there are always people popping up who like to point out that people should not bet on the exchanges but stick with bookmakers because of the PC. This is of course nonsense and is akin to saying that if your boss offers you a pay rise to 20k per year turn it down because the rich get taxed at 45%. These objectors are invariably traders or in running players and not Joe BackorLay.

So how does Hugh Taylor come into all this, well I decided to run a simulation on Hugh selections from 2014 (year chosen randomly), to see at what thresholds of performance he would start paying PC. I will assume that at the start of 2014 he has played in 250 markets and that he is in profit. I will also assume that he backing his selections on Betfair at the prices he advises (not literally so don’t get upset Hugh if you are reading this). During this year Hugh made thirty odd percent ROI to his prices. To look at various thresholds I will randomly remove winners in steps to gradually bring the ROI down and each step will be run through 100 times to randomly distribute the winning bets eliminated and the average profit will be taken from the 100 run simulation. Here are the results

Remove 10% of winners
Bets 537 PL +130.09 ROI% +24.22% Comm’ paid 12.66 Comm% 9.73

With 10% of winners removed Hugh is paying 9.73% of his Gross profit as commission and is eligible for PC

Remove 12%
Bets 534 PL +106.5 ROI% +19.94% Comm’ paid 12.19 Comm% 11.43

Remove 14%
Bets 532 PL +93.2 ROI% +17.52% Comm’ paid 11.92 Comm% 12.78

Remove 16%
Bets 530 PL +78.86 ROI% +14.87% Comm’ paid 11.63 Comm% 14.75

Remove 18%
Bets 529 PL +70.1 ROI% +13.25% Comm’ paid 11.46 Comm% 16.34

Remove 20%
Bets 527 PL +48.9 ROI% +9.28% Comm’ paid 11.03 Comm% 22.5

From the above Hugh would hit PC at above ROI 10% profit. It would appear therefore that if you are a punter having around 600 bets per year ie two per day and you are making more than 10% profits you may run into PC. The number of people capable of doing this on Betfair will be very small and furthermore if you fit into this category you will no doubt have the nouse to generate a number of bets/lays that have a net break even effect on your PL but crank up your commission paid. If you cannot be bothered to do this then simply bet on Betdaq.

I will finish by taking a look at the former. If Hugh found around 1,100 even money bets during 2014 that net of commission broke even he would avoid PC when he makes a ROI of 21.3% or less. Of course he is not restricted to even money shots but for ease of calculation. Now if you are making 21% on Betfair then you should either stop worrying or apply for Hugh’s job.

All that counts is Profit, ROI is for the ego.

Lower Class Races Don’t Believe Me Believe Hugh

I ran a Twitter poll today asking people whether they think punters faced with low grade handicaps and high grade handicaps should avoid low grade races or embrace them. There are still plenty of people who believe the media numpties who spew out the daily nonsense to either mask their lack of betting nouse or perhaps encourage you to lose more by advising you that low grade handicaps are unreliable betting mediums. ( Currently on Twitter 70% think low grade is OK and 30% think it should be avoided)

I considered chucking some data at you to try and persuade you that low grade races are perfectly good betting mediums and that you should not steer away from them, but I decided instead to take a different tack as statistical analysis is not every bodies cup of tea.

No doubt you have heard of Hugh Taylor. He is a rare breed in that he is a media tipster who actually publishes all his past tips and results. Most do not because they would no doubt be so god awful that people would start to question how they stay in work. Hugh is pretty good to say the least as his record shows but you would think that if low grade races are unreliable then Hugh would steer clear of them. Of course this is difficult because there are plenty of days when all Hugh faces is low grade racing. One would think therefore that being forced to consider these money sponging class of races Hugh would struggle to make them pay or at least perform with inferior results.

Below is his tip record by class for years I have at hand (not cherry picked) showing low class as class 5 and above and high class class 4 and below.

2017 Class 5+ Bets 246 PL 95.95 ROI 39%
2017 Class 4- Bets 272 PL 97.3 ROI 35.79

2016 Class 5+ Bets 241 PL 45.74 ROI 18.97%
2016 Class 4- Bets 286 PL 128.4 ROI 44.91

2015 Class 5+ Bets 231 PL 147.35 ROI 63.7%
2015 Class 4- Bets 290 PL 22.25 ROI 7.67

2014 Class 5+ Bets 234 PL 47 ROI 20.08%
2014 Class 4- Bets 283 PL 132.5 ROI 46.8%

Class 5+ Bets 952 PL 336.04 ROI +35.2%
Class 4- Bets 1131 PL 380.7 ROI +33.6%

Virtually no difference, in fact low class is slightly but not significantly ahead. OK so do you still think low class racing is unworthy and unreliable ?

The Rules of Betting

A recent Pinnacle article

set me thinking about what advice I would pass on to punters wanting to flatten the curve (apologies for the steal). How can punters move their negative expectancy towards a positive one even if they do not quite get there. I have said many times that punters losing less means punters are likely to stay in the game as the prospect of winning seems closer and god knows Racing needs more punters to stay in the game at the moment.

Here are my tips, I will add to them as time go’s by

1. Log/retrieve your bets so you know exactly how much you are losing in terms of ROI%. Very easy to do these days with online accounts allowing you to download account details. Betfair is wonderful for this giving you full account bet details.

2. From 1 look at how much randomly you would lose betting at your average bet price and hence how steep is the incline ahead. If your average bet price means you would randomly betting be losing say 12% and you are losing 12% then you are doing no better than a pin sticker.

3. Interrogate past results using software or online resources and get a feel for one or two ideas that will flatten the slope. In other words if you tend to lose say 7% how can I apply basic rules and get that down to 5% or maybe less. You can also use bet analysis software that will analyse your own bets and allow you to potentially highlight strength and weakness. Maybe you do well in sprints but poorly in other races. Be careful of sample sizes though unless you can articulate a reason for this eg I analyse draws and perhaps this gives me a push in the right direction for sprints. Example of software below

4. Always decide what you are going to bet at the beginning of the day, once you have decided stick with it. Sounds inflexible but it will be outweighed by the removal of loss chasing. Place your bets in the morning and then walk.

5. ALWAYS take the best available price and use Betfair to gauge whether to take it now or wait. How does this work, well if you fancy X and the best bookmaker prices is 5/1 but on Betfair its 5.5/1 then wait or if that’s not possible due to time take BFSP. If on the other hand its 5.0/1 or less then take the price if you can. If time does not permit this kind of work then just take Betfair SP (see previous article)

6. Track your price taken to closing price ie what price did I take and what was the Betfair SP. If initially you are not winning but you are significantly beating the closing price then keep going sit back and wait with the small crumb of comfort that thought you are not yet winning you will be eventually.

7. Do not be afraid of lower class races, see blog following this one

Of course point 6 comes with the caveat that pretty quickly you will be banned by bookmakers so working to Betfair will be the only real option. Do not be put off by this you can still be profitable to Betfair.

Have a tip of your own, add a comment below

Rate this article

Jockey Efficiency Revisited

What makes a good jockey or one jockey better than another ?. If it is down to aesthetic impression then we may be justified in saying that Lester Piggot was better than Willie Carson but if we are assessing as punters then we could not care less if our jockey is a Keiran Fallon or a Cash Asmussen. What we want is a jockey that consistently gets the best out of a horse in terms of winning chance. One of the main contributors to this complex chance equation is how the horse distributes his energy, how efficient the horse runs. When you see a front runner challenged early for the lead and a jockey too scared of disobeying instructions to take a pull deciding instead to contest that lead, you know more than likely your chance of winning is sunk.

Total Performance Data provide data on sectional times and ATR under the guidance of Simon Rowlands are producing assessments of race efficiency for each horse. This allows us to take a look at how various jockeys perform with regard to their efficiency scores. There are numerous problems with this approach but one major confounder is that better jockeys may have better more controlable rides. In order to dampen this down I split the data into Jockey strike rates so that we are comparing jockeys from similar strike rate brackets.

Taking the average efficiency score (lower is better) for each jockey over the last 2 years we have the following results.

Jockey strike rate 21.7% down to 15%

W Buick 51.78
S De Sousa 53.57
Jim Crowley 54.16
B A Curtis 55.58
Oisin Murphy 55.76
A Kirby 58.79
Jack Mitchell 60.19
Dane O’Neill 60.40
R Havlin 61.96
R L Moore 66.33
D Tudhope 69.83
James Doyle 70.11
J Fanning 72.40
Andrea Atzeni 76.62

Jockey strike rate 15% down to 12%

S M Levey 49.120
William Carver 49.672
R Kingscote 50.033
M Harley 51.964
C Lee 52.325
Joshua Bryan 52.814
Hollie Doyle 53.468
Ben Curtis 54.215
P McDonald 54.618
Callum Rodriguez 54.909
F Norton 54.975
Jason Watson 55.636
Jason Hart 55.889
Kevin Stott 57.717
Cieren Fallon 57.833
P Cosgrave 59.139
Harry Bentley 62.918
P J McDonald 63.615
N Mackay 65.232
Megan Nicholls 67.551
P Hanagan 69.040
Edward Greatrex 74.516
J P Spencer 82.389

Jockey strike rate 12% down to 10%

Sebastian Woods 43.76
Connor Beasley 47.39
D Nolan 50.11
P Makin 50.18
Georgia Dobie 50.41
D Allan 50.46
Finley Marsh 51.79
Cameron Noble 52.15
Rossa Ryan 52.69
Alistair Rawlinson 53.55
David Probert 53.59
Tom Marquand 54.41
George Rooke 54.91
Barry McHugh 55.3
Harrison Shaw 55.51
Hayley Turner 57.76
Hector Crouch 59.69
Adam J McNamara 60.45
David Egan 60.86
Poppy Bridgwater 61.49
Seamus Cronin 62.03
Jamie Gormley 64.44
P J Dobbs 64.72
Kieran Shoemark 66.65
R Winston 68.09
P Mulrennan 70.31
G Mosse 72.17
Charles Bishop 75.05
Ben Sanderson 77.14
Thomas Greatrex 90.79
L Morris 391.80

It is interesting that the average efficiency scores are higher (worse) for higher strike rate jockeys than the other two categories if we leave out the final outlier in category 3. Perhaps the fact that better jcokeys are on more fancied horse could play a part but if I look at only horse going off at less than 11.0 on Betfair SP we have the following average efficiency scores

top Jocks 62.37
Middle jocks 54.94
Lower jocks 52.7

Please rate this article

Betfair Blasts Cheltenham

Cheltenham has just finished for another year. Guinness consumption has gone through the roof, Corona beer consumption is lying in a Mexican gutter and the newly launched Tote has finished its first festival with some appalling results.

During the festival the Betfair SP outperformed bookmakers SP and Tote returns on every single race. The Tote had the audacity to brag on Twitter about the 162.6 return on the 66/1 winner of the Foxhunters and yet punters would have been rewarded with 224.95 on Betfair. It Came to Pass is exactly what punters should be doing when it comes to the Tote, pass and bet at Betfair SP if price juggling is not your thing.

How much worse off would you have been?, well if you had backed every winner at the meeting with Betfair SP you would have won +579.01 counting stakes whilst on the Tote you would have won +453.62

The Tote was inferior to the bookmaker SP in 20 of the Cheltenham races and beat bookmaker SP in only 8. The Tote do match the bookmaker SP if its bigger but these figures suggest that the Tote is leaning towards being an SP based service in the main.

Why is this important?, well for a number of reasons. Firstly it sticks in my throat that Betfair SP hardly ever gets mentioned by the media. The Racing Post never report it and on the TV channel the fool in the ring would rather run naked amongst the pitches than mention Betfair Prices. More importanlty though is that in order to attract people to Racing we must not kid ourselves that this is not primarily a betting sport and to attract people it must feel solvable even if for the vast majority it is not. Punters who lose their money more slowly are far more likely to stay engaged and try to improve and hence stay in the sport. Getting the best possible return is vitally important in slowing down the attrition for losing punters.