Six Degrees of Separation

What do Sean Bean the actor and I have in common. At first I thought it might be that we are both from Sheffield, or maybe that he studied drama at Rotherham College of Art and Tech where I taught for 4 years back in the 80’s. Even closer to home is that my 81 year old cycling buddy in Portugal is a guy who has a regular ‘last of the summer wine’ Friday meet up with his mates in Sheffield of which one used to be Sean’s dad until he passed away. It could also be that we both support Sheffield United and Idolise Tony Currie, the best English midfield player of the 70’s.

But closer to home for me is that a playwright called Steve Wakelam, a yorkshire lad, wrote a play back in the late 70’s about two young lads who try their arm at professional punting. I know Steve although my friend,and the guy who introduced me to Racing, knows him better having been taught by him when Steve was a school teacher. I met Steve on several occasions at our annual York races soiree in August and I have always been aware that the play was based on my friend John and myself. What I did not know until recently is that it was filmed as a BBC1 play with Sean playing what appears to be the third lead role (alas not John or I). I have not seen it but at least it would be one role in which Sean would not have a problem with the accent.

I do not think the two parts are quite distinct in terms of me and my mate John, rather they appear to be an amalgamation of both of us. My friend did work as a groundsman at a monument and does have a more romantic view of Racing whilst I hold the more hard nosed Mathematical viewpoint. The second character who appears to be the proverbial loser, along for the ride,  is hopefully purely a fictional character.

AE Ratings V Random Forests

I have spent the last few days working on a Random Forest version of my own flat handicap ratings. The original ratings are based on AE values or Actual divided by Expected values to give them their full name. Let me remind you of what AE values are. If we are say calculating the AE values of last time out winners, we can look at all lto winners and for each horse calculate its market chance by taking its SP or BFSP, stripping out the over round and then take the odds as its chance of winning. So an even money shot should win 0.5 times if the odds are true. We sum up all these win chances along with the actual win count for these horses. this gives an E (expected) value and an A (actual) value. If we divide the A value by the E value and it is greater than 1 then, in our example, last time out winners are winning more times than the market estimates. If the value is less than 1 then the market is over betting them.

I trained a Random Forest model on my data for 2009 to 2013 and then tested on the years 2014 and 2015. The original AE model produced the following results for top rated horses.

Bets 7918 Wins 127` PL +255 to BFSP after comm’ ROI +3.2%

The Random Forest model produced the following results for top rated horses

Bets 7699 Wins 1164 PL +323 to BFSP after comm’ ROI +4.1%

The software used was Python with the Skicit Learn Random Forests library. See my intro blog entry on this software.

The initial interest in this area stems from an excellent article published by Stefan Lessman which is linked below

The next step for me is to extend the model by taking the Lessman and co’s example of moving to a second step of using the resulting RF ratings and combining with the market price of each horse using regression to eventually produce an oddsline. Of course BFSP is not known until after the off but final prices can be a good estimation. Lessman and Bentner argue this two step separation of fundamental race parameters and odds to stop the odds swamping the model parameters when used together at the same time.

I should also perhaps look at some more metrics on this model first as it may have not escaped your notice that the win rate on the AE model is greater than that of the RF model. Lessmann puts up some strong arguments for Random Forests in his article so if you are interested in race modelling it might be worth taking a look.

Watching Frankel

Today saw the first son of Frankel make his debut in the UK and this also coincided with my finishing the sequel to that excellent book Watching Racehorses by Geoffrey Hutson, the obviously named Watching more Racehorses.

I loved the first copy which attempted to numerically represent those soft subjective observations we get thrown at us every weekend by so called paddock watchers. The new book is not as good simply because it is padded out somewhat with observations on areas outside the paddock. Nevertheless it still adds more data to some of those familiar and unfamiliar areas of paddock watching. For example in the first issue sweating is not cited as a negative but in the second issue he puts more meat on this observation by stating that when the temperature is above 21c sweating is not a negative. Another interesting observation is that coltishness is also not a negative.  So what is a negative, well if you want a negative you can get your teeth into sample size wise then consider cross nose bands.

How does this all relate to Cunco the son of Frankel who has just bolted in. Well he drifted like a barge after becoming coltish in the parade ring. Only he and Mr Hutson seemed to know.

Betfair SP’s Part 2

In my previous blog post I mentioned the care needed when doing research to Betfair SP. This was courtesy of an alert by an observant member of the SmarterSig email forum.

Today I will demonstrate just how much difference this anomaly can make. At the moment I am tracking a betting method based on a combination of racing selection strategy and financial trading methods. At first glance the option of betting to BFSP seemed more attractive than taking a price provided you can find some sort of stake threshold by which you do not cannabalise your own BFSP with the size of your stake.

Using the odds displayed when you download your betting summary to calculate a level stake PL to BFSP I get the following results when comparing backing the selections to available price compared to BFSP.

Available price,  Bets = 1717 PL = +27.3 points after comm

SP Price, Bets 1717 PL = +35.2 points after comm

A small increase using BFSP

Of course things are never that simple and the  prices handed to me via the Betfair download do not account for R4’s. Now taking the the profit by calculating the winnings divided by the bet stake we get the following profit for the two categories

Available price Bets 1717 PL -3.04 points after comm

Bets 1717 PL = +22.1 points after comm

The profit from then live prices simply has not survived the R4’s occurred during the time from taking the bets in the last few minutes to off time. The BFSP’s however will have fewer R4’s, perhaps only being affected by markets that have not reformed perhaps due to a stall non entry. There was a 0.7% drop in ROI when the R4’s on BFSP were accounted for.

Conclusion – You need to make sure when calculating points profit on bet summaries that you use the profit divided by stake and not the price to calculate. Also when assessing new strategies retrospectively to BFSP you need to account for late R4’s. A reduction of 1% on ROI would seem prudent.

Betfair SP’s

I have mentioned before about the fact that Betfair SP’s seem to produce a race overround or should I say underround, below 100% on a good number of races. A possible explanation for this was put forward by a member of the Smartersig email forum

The member stated the following which I have to admit I had overlooked.

I assume that I’m not alone in using Betfair SPs as the benchmark to assess the profitability of a potential new system.  Of course there’s an argument that this isn’t entirely accurate as your own theoretical bets might have altered the BFSP but nothing is perfect.

However, I recently noticed that Betfair SPs are NOT recalculated to allow for Rule 4 deductions after a late withdrawal.  I know that Betfair apply their own deduction to any bets (whether a price was taken or BFSP) but had assumed that SPs would be re-normalised after the race to account for this.  Unfortunately it seems they are not, and the historical BFSPs that are released by Betfair in CSV format (or on the Timeform website) do not account for withdrawals.  As far as I know there’s no easy way to get this information, so it means that the profit/loss of any system researched using Betfair SPs is flawed because of this.

The other thing to watch is dead-heats as this will also affect the bottom line.  It’s relatively straightforward to calculate in Win markets but it becomes more complex in place markets.  For instance if there’s a 6-runner race with 2-places paid out in the place market and your horse dead-heats for first place then the dead heat is irrelevant.  It will be treated as a full-stake bet.  However if another horse wins and your horse dead-heats for second in the place market then your return will be calculated to a half stake.  In other words there’s no ‘one size fits all’ solution for dealing with dead-heats.

The main point is about the Rule 4’s though.  Just wondering if anyone else has dealt with this issue before?  it’s hard to assess how much it might actually affect the ‘true’ bottom line of a researched sequence of 1000s of bets.

Clearly a pinch of salt is needed when assessing any betting approach to Betfair SP which these days is the common method used. There is probably less of a problem with NH racing as the bulk of late withdrawels will be stll problems on the flat.
One possible simple solution would be to readjust all prices so that a book which comes out at sub 100 is normalised to around 101 or 102% which is more likely to reflect the true BFSP after R4’s. If there is any interest in this topic I could look into it further. If the writer of the above is correct then there should be some whopping underrounds when an even money shot gets withdrawn at the stalls.

Do The Survey

If you have ever had an account closed or restricted, and I guess that’s just about everyone, (If you have not then you are doing something terribly wrong with your betting) then please do the HBF survey into the problem and maybe just maybe we can get something done. Please tweet, blog, wear a T shirt and anything it takes to get the word around

Account Restriction/Closure Survey

Which would be more lucrative to Racing and even to Bookmakers

1. Restrict and close all punters that beat prices and show any inkling of a brain and hence see the betting turnover shrink as existing punters turn away and new punters do not arrive. Kind of like a greater ROI but lower turnover for racing and bookies.

2. Allow an Aussie type system where winners allowed to certain individual bet liabilities. Winners now feel free to be more visible which in turn encourages new players into the sport who of course in the majority will lose. Net result lower ROI but much greater turnover = greater profit than option 1

In many ways the above scenario is rather like betting on Exchanges V betting with bookmakers. With Bookmakers you can make a greater ROI but you cannot get much on if anything at all. When you switch to bet on the Exchanges you have to learn that ROI will shrink so turnover must be vastly increased.

Oddslines a Line In The Sand

I have been busy this week not just watching Cheltenham but also working on a Bayesian set of flat handicap ratings or should I say a Bayesian version of my flat handicap ratings.What has prompted me in this direction are two things. First of all my ratings do well but less well towards the off time of races which is a pity because the greatest liquidity on Betfair exists the closer you get to the off. My reasoning therefore is to try and produce the most efficient oddsline with the minimum amount of time and effort and with Bayes producing natural percentage chances of winning it is hoped that this may prove a way of cranking up the action towards the off.

The second reason is that I have already had some success in this area and hence the title of this post. No I do not have a wonderful oddsline approach to AW racing as the title suggests. What I have done in a completely unrelated area to my ratings is produced a novel oddsline approach.  Now forgive me, for obvious reasons, if I do not spell out the nuts and bolts of this method, but I do feel there is a more general message which is worth passing on and could be of use to some of you.

My ‘novel oddsline’ approach initially did not generate profits, if memory serves me right I think it produced a small loss on Betfair. What I did next was simply to move the dividing line which specified bet or no bet. In other words where I had previously simply bet if the odds were greater than X I changed this to bet where the odds are greater than X + y%. I moved the line in the sand so to speak. This has now generated around +6% after commission profit off around 5,000 bets all under 10.0 on Betfair.

For older members of SmartSig this may well have a familiar ring to it. Some of you may recall an excellent article in the original SmartSig mag’ by an author called Filly. In this article Filly’s main point, if memory serves me well, was that just about any sensible approach to ranking or selecting horses could be made profitable by moving the threshold from a price point of view. At the time I was not sure whether I could buy into this. Some members were even wondering whether a forecast such as the Racing Post could be used and then, using an appropriate percentage above the forecast, a profit would ensue.

It seemed at the time to be pushing the boundaries of acceptance but the more experienced members of the group were more in the Filly camp of thinking than out. In fact I seemed to recall that it either won the article of the year award or came second.

I will keep you informed of progress on the Bayes approach but in the mean time remember not all fillies are bad bets.

If you recall the article or have some personal opinion on oddslines in general please leave a comment. All bloggers appreciate some sort of feedback.

The Humble KNN

I hope you got something out of the intro session on machine learning and Python. I think the KNN model is often used as a first sample of ML because it is fairly easy to get your head around what is happening under the bonnet. This begs the question, how important is it to know what is going on under the bonnet ?. Can we treat these ML algorithms as black boxes or do we need some understanding of the underlying mechanism ?.

Part of an answer to this question came when I progressed further with my ML investigation. I first thought that using a KNN method might be well suited to trainer patterns. Let’s face it we are always wondering if a certain trainer does well with certain characteristics of a horses profile and are we not always told that trainers are creatures of habit, repeating the same winning strategies time and again ?.

I did a run down of the trainers with the most prolific number of runners in handicaps. The first thing that struck me about this data was the U shaped curve it had in terms of losses blind betting their runners. In other words if you backed all runners, trainers with poor strike rates lost you more than medium strike rate trainers but it also became poor again when looking at high strike rate trainers. The upper levels are I presume so well known that the public simply overbet them. I decided therefore to select 10 trainers from the sweet zone in the middle who had the most runners in handicaps.

Using the same data from the previous exercise minus the trainer strike rate I discovered alas no significant profitable trends from the trainers. The model simply performed poorly. The exercise was not a complete waste of time however, if you are interested in focusing on the habits and run styles of a select few trainers I would suggest looking into the middle sweet spot. It may be that the trainers outside the top runner count are easier to churn a profit from.

One more green light went on in my head as a result of the KNN exercises and this proved far more promising, in fact that may prove to be a gross understatement.

I decided to take a look at the old chestnut of predicting when a price during the live 10 minute betting shows, is a price that will beat Betfair SP. I was looking at prices from UK flat handicaps taken every 7 seconds. The data used in the model was purely technical. Investopedia define technical analysis thus

A method of evaluating securities by analyzing statistics generated by market activity, such as past prices and volume. Technical analysts do not attempt to measure a security’s intrinsic value, but instead use charts and other tools to identify patterns that can suggest future activity.”

This is in contrast to functional analysis where external factors to the market are considered. An example in horse racing would be the jockey or trainer of a horse.

The good thing about this data in contrast to the previous sessions is that it was far more balanced in terms of outcomes. Around 47% of prices are inferior to betfair sp in the range of under 10.0 (which is where I focused).

With K set at 11 the model produced 131,474 selections in which 64.5% were correct in that the price taken beat or was equal to Betfair SP. This number of selections would be from approximately 3 weeks of racing. Yes it is a ,lot but the model will make multiple suggestions on the same horse in the same race if at each point it is deemed to be a good bet to beat sp.

This to my naive non trading eyes looked quite promising but things got better on two points. First of all I substituted Betfair SP with last live show, after all it is unlikely from a trading perspective one would trade to BFSP. You would have to guess the amount and be in danger of cannabalising your own SP. Also given previous reveltions about BFSP and the fact that around 40% of books to BFSP are under 100%, it seemed logical that final show would be a better proposition than BFSP. It proved to be the case but not by huge amounts. It added about 0.5% to the correct prediction score.

The second adjustment provided a more significant improvement. Using a Random Forest algorithm instead of the KNN upped the correct prediction rate to 69%

Random Forests are an ensemble method of modelling. They use decision trees to model the predictions but they use a number of trees and then aggregate the results from the individual trees to give a final result. This makes them far less susceptible to over fitting on the data. I also used the RF as a simple black box. Python is useful in this way in that you can plug in the alternative model into your existing code as shown below.

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 100),winlose_train)

predicted = rf.predict(data_test)

The rest of the code is essentially the same.

I would welcome the thoughts of any traders out there on this topic.

Profitable Punting With Python 1


I have prepared some introductory sessions on machine learning for horse racing using Python and Scikit Learn. You do not need previous experience of either of these two tools but it would help if you are at least familiar with some basic programming concepts. For example it would help if you know what a FOR loop is, what an assignment statement is even if it is not in Python.

The main data file will be freely available until Tuesday 2nd February for those who showed an initial interest. After this it will be in the utilities section of the web site. A modest members fee will enable you to access it.

The instructions will be freely available to all at all times.

OK to get started you will need to have downloaded and installed Anaconda Python v3.4, see previous blog post for details.

Once this has installed create a folder in your anaconda folder called horseracing.

All comments, questions and feedback should be posted to this blog post, that way they can act as a FAQ source.

First of all download the following zip file, double click on it to reveal all the contained files and copy them into your horseracing folder.

The next step is to download the following file into your horse racing folder. When you click the link it will probably display the contents in your web browser. Just right click the display and you will have the option to save to a file the screen data.

You now have the required files. To get started first open a msdos command window (the black box type)

Now navigate to your anaconda folder using cd command eg cd anaconda

Kick start Ipython Notebook by typing in ipython notebook and pressing return

Once notebook is loaded up you will be presented with a directory screen of folders. Double click on the horseracing folder (that you created) to go into that folder.

Now double click on the file ProfitablePuntingWithPython1.ipynb

Follow the instructions within the displayed notebook.

Profitable Punting with Python Intro

Hoping to run a series of sessions introducing Machine Learning for horse racing betting via Pythons Scikit Learn. I will be assuming that you have some rudimentary programming knowledge although that may not be in Python. In other words you have some understanding of what a loop is, what a variable is and what an array is even if not in Python.

What do we mean by Machine Learning ?. Essentially getting our computer to build a model of past racing data so that we can use this model to effectively predict the outcome of future race data.

If you are interested in participating then you will first of all need a version of Python installed along with the libraries we are going to use. Even if you have a version of Python you can still install as I have done, the version I am about to recommend, into a folder of its own and run it from there even if it is not your default Python. I am going to suggest Anaconda because it comes with all the extra we will need such as Ipython Notebook, so there is no need to fiddle around with separate installs.

Downloading the Anaconda Free version of Python is quite painless and has everything we are going to need included in the download.

I have the 3.5 version for 64 bit Windows on my PC, you can choose the appropriate version (32 bit or 64 bit) to download at

If you are not sure if your PC is 64 bit or 32 bit check the following

Check python is loaded OK by opening an MSDOS command window and navigating to your Anaconda folder using the cd command.

Once in the folder type in Python -V

It should display your version number.

If you are interested in this series and have installed Anaconda OK let me know with a brief comment below so I can gauge interest.

UPDATE – Hope to start things on February 1st when hopefully all will be ready. Note I have modified the above and I am now running Python 3.5. If you have 2.7 it might be best to uninstall using the uninstall.exe file and then install 3.5 from the Anaconda website.

When you have done this create, inside your anaconda folder, a new folder called horseracing. You can call it something else if you wish as long as you know that when I refer to the horseracing folder I mean your equivalent.

I will start the ball rolling with a new blog post called Profitable Punting with Python 1

Each blog entry will introduce the next session briefly and where to pick up things from and the comments section at the foot of each blog will act as a discussion board and troubleshooting.




Get every new post delivered to your Inbox.