Newcastle AW Pace

It is perhaps a little early to be evaluating the pace angle at the new Newcastle AW track but I thought an update on how things are measuring up might be in order.

So far working to BFSP before commission and using SmarterSig pre race pace figures and I emphaseise pre race here, we have the following figures

Hold up ie Less than 1.4 pace figures for prior race runs 372  wins 31 PL -86pts

Prominent ie pacefig greater than 1.4 and less than 3.2  runs 308 wins 35 wins PL +47.7

Led ie pacefig greater than  3.2 runs 114 wins 9 PL -59.55

Horse with a prominent run in their previous race as opposed to held up or led have done very well next time out at Newcastle so far although the duration and sample sizes are very small.

ITV More Of The Same

We are soon to get a new TV presentation team for Racing with one or two familiar faces still in place. There is much debate at the moment with sighs of relief as some are left out whilst groans as others are not included. The truth is that the ITV team is much like the next USA government or indeed our own UK government. The faces change but the underlying driving force and message remains the same. The illusion is that you are getting something new whilst the truth is that you are in for pretty much the same.

So what is the ‘same’ and why do I no longer listen to TV pundits who in reality should come with a wealth warning.

First of all we all know that the punditry will be geared towards protecting the main sponsors namely the bookmakers, this is even more so on a commercial run TV channel. This is not all bad if you learn to recognise how this anti punter stance can work to your advantage. TV pundits provide one huge disservice to losing punters and paradoxically a positive service to winning or potentially winning punters. They pretty much all to a man or woman approach a race from the point of view of finding the winner. Sure every now and then they mention that so and so is too short for a bet but but this is hardly value betting.  Anyone who has managed to move from being a losing punter to a winning one over the years will without doubt have first suffered from the whats going to win mentality inflicted on them by the media. Once you have twigged that those on the screen either do not know how successful punting works or prefer the short term safety of finding a few winners over trusting that the watching public might stick with them with more riskier long term selection methods that actually yield a long term profit. But don’t lets moan about this, the fact that your presenters are incompetent is a bonus. It keeps the majority of punters in the dark which is where they need to be if you are to keep winning.

That last sentence might sound a little harsh but the reality is that most punters have to lose in order for some punters to win. The truth is that they dont have to lose as much or as rapidly as they do. If every losing punter switched to the exchanges they would immediately as a group lose less without impacting on winning punters in a negative fashion, in fact quite the opposite.

So in conclusion do not complain about the dumbing down or the lack of smart punting advice without it your life would be a lot harder unless of course you really are listening to them.

Six Degrees of Separation

What do Sean Bean the actor and I have in common. At first I thought it might be that we are both from Sheffield, or maybe that he studied drama at Rotherham College of Art and Tech where I taught for 4 years back in the 80’s. Even closer to home is that my 81 year old cycling buddy in Portugal is a guy who has a regular ‘last of the summer wine’ Friday meet up with his mates in Sheffield of which one used to be Sean’s dad until he passed away. It could also be that we both support Sheffield United and Idolise Tony Currie, the best English midfield player of the 70’s.

But closer to home for me is that a playwright called Steve Wakelam, a yorkshire lad, wrote a play back in the late 70’s about two young lads who try their arm at professional punting. I know Steve although my friend,and the guy who introduced me to Racing, knows him better having been taught by him when Steve was a school teacher. I met Steve on several occasions at our annual York races soiree in August and I have always been aware that the play was based on my friend John and myself. What I did not know until recently is that it was filmed as a BBC1 play with Sean playing what appears to be the third lead role (alas not John or I). I have not seen it but at least it would be one role in which Sean would not have a problem with the accent.

I do not think the two parts are quite distinct in terms of me and my mate John, rather they appear to be an amalgamation of both of us. My friend did work as a groundsman at a monument and does have a more romantic view of Racing whilst I hold the more hard nosed Mathematical viewpoint. The second character who appears to be the proverbial loser, along for the ride,  is hopefully purely a fictional character.

http://www.compleatseanbean.com/punters.html

AE Ratings V Random Forests

I have spent the last few days working on a Random Forest version of my own flat handicap ratings. The original ratings are based on AE values or Actual divided by Expected values to give them their full name. Let me remind you of what AE values are. If we are say calculating the AE values of last time out winners, we can look at all lto winners and for each horse calculate its market chance by taking its SP or BFSP, stripping out the over round and then take the odds as its chance of winning. So an even money shot should win 0.5 times if the odds are true. We sum up all these win chances along with the actual win count for these horses. this gives an E (expected) value and an A (actual) value. If we divide the A value by the E value and it is greater than 1 then, in our example, last time out winners are winning more times than the market estimates. If the value is less than 1 then the market is over betting them.

I trained a Random Forest model on my data for 2009 to 2013 and then tested on the years 2014 and 2015. The original AE model produced the following results for top rated horses.

Bets 7918 Wins 1276 PL +305 to BFSP after comm’ ROI +3.2%

The Random Forest model produced the following results for top rated horses

Bets 7699 Wins 1164 PL +323 to BFSP after comm’ ROI +4.1%

The software used was Python with the Skicit Learn Random Forests library. See my intro blog entry on this software.

The initial interest in this area stems from an excellent article published by Stefan Lessman which is linked below

http://www.sciencedirect.com/science/article/pii/S0169207009002143

The next step for me is to extend the model by taking the Lessman and co’s example of moving to a second step of using the resulting RF ratings and combining with the market price of each horse using regression to eventually produce an oddsline. Of course BFSP is not known until after the off but final prices can be a good estimation. Lessman and Bentner argue this two step separation of fundamental race parameters and odds to stop the odds swamping the model parameters when used together at the same time.

I should also perhaps look at some more metrics on this model first as it may have not escaped your notice that the win rate on the AE model is greater than that of the RF model. Lessmann puts up some strong arguments for Random Forests in his article so if you are interested in race modelling it might be worth taking a look.

Watching Frankel

Today saw the first son of Frankel make his debut in the UK and this also coincided with my finishing the sequel to that excellent book Watching Racehorses by Geoffrey Hutson, the obviously named Watching more Racehorses.

I loved the first copy which attempted to numerically represent those soft subjective observations we get thrown at us every weekend by so called paddock watchers. The new book is not as good simply because it is padded out somewhat with observations on areas outside the paddock. Nevertheless it still adds more data to some of those familiar and unfamiliar areas of paddock watching. For example in the first issue sweating is not cited as a negative but in the second issue he puts more meat on this observation by stating that when the temperature is above 21c sweating is not a negative. Another interesting observation is that coltishness is also not a negative.  So what is a negative, well if you want a negative you can get your teeth into sample size wise then consider cross nose bands.

How does this all relate to Cunco the son of Frankel who has just bolted in. Well he drifted like a barge after becoming coltish in the parade ring. Only he and Mr Hutson seemed to know.

Betfair SP’s Part 2

In my previous blog post I mentioned the care needed when doing research to Betfair SP. This was courtesy of an alert by an observant member of the SmarterSig email forum.

Today I will demonstrate just how much difference this anomaly can make. At the moment I am tracking a betting method based on a combination of racing selection strategy and financial trading methods. At first glance the option of betting to BFSP seemed more attractive than taking a price provided you can find some sort of stake threshold by which you do not cannabalise your own BFSP with the size of your stake.

Using the odds displayed when you download your betting summary to calculate a level stake PL to BFSP I get the following results when comparing backing the selections to available price compared to BFSP.

Available price,  Bets = 1717 PL = +27.3 points after comm

SP Price, Bets 1717 PL = +35.2 points after comm

A small increase using BFSP

Of course things are never that simple and the  prices handed to me via the Betfair download do not account for R4’s. Now taking the the profit by calculating the winnings divided by the bet stake we get the following profit for the two categories

Available price Bets 1717 PL -3.04 points after comm

Bets 1717 PL = +22.1 points after comm

The profit from then live prices simply has not survived the R4’s occurred during the time from taking the bets in the last few minutes to off time. The BFSP’s however will have fewer R4’s, perhaps only being affected by markets that have not reformed perhaps due to a stall non entry. There was a 0.7% drop in ROI when the R4’s on BFSP were accounted for.

Conclusion – You need to make sure when calculating points profit on bet summaries that you use the profit divided by stake and not the price to calculate. Also when assessing new strategies retrospectively to BFSP you need to account for late R4’s. A reduction of 1% on ROI would seem prudent.

Betfair SP’s

I have mentioned before about the fact that Betfair SP’s seem to produce a race overround or should I say underround, below 100% on a good number of races. A possible explanation for this was put forward by a member of the Smartersig email forum

https://groups.yahoo.com/neo/groups/SmarterSig/info

The member stated the following which I have to admit I had overlooked.

I assume that I’m not alone in using Betfair SPs as the benchmark to assess the profitability of a potential new system.  Of course there’s an argument that this isn’t entirely accurate as your own theoretical bets might have altered the BFSP but nothing is perfect.

However, I recently noticed that Betfair SPs are NOT recalculated to allow for Rule 4 deductions after a late withdrawal.  I know that Betfair apply their own deduction to any bets (whether a price was taken or BFSP) but had assumed that SPs would be re-normalised after the race to account for this.  Unfortunately it seems they are not, and the historical BFSPs that are released by Betfair in CSV format (or on the Timeform website) do not account for withdrawals.  As far as I know there’s no easy way to get this information, so it means that the profit/loss of any system researched using Betfair SPs is flawed because of this.

The other thing to watch is dead-heats as this will also affect the bottom line.  It’s relatively straightforward to calculate in Win markets but it becomes more complex in place markets.  For instance if there’s a 6-runner race with 2-places paid out in the place market and your horse dead-heats for first place then the dead heat is irrelevant.  It will be treated as a full-stake bet.  However if another horse wins and your horse dead-heats for second in the place market then your return will be calculated to a half stake.  In other words there’s no ‘one size fits all’ solution for dealing with dead-heats.

The main point is about the Rule 4’s though.  Just wondering if anyone else has dealt with this issue before?  it’s hard to assess how much it might actually affect the ‘true’ bottom line of a researched sequence of 1000s of bets.

Clearly a pinch of salt is needed when assessing any betting approach to Betfair SP which these days is the common method used. There is probably less of a problem with NH racing as the bulk of late withdrawels will be stll problems on the flat.
One possible simple solution would be to readjust all prices so that a book which comes out at sub 100 is normalised to around 101 or 102% which is more likely to reflect the true BFSP after R4’s. If there is any interest in this topic I could look into it further. If the writer of the above is correct then there should be some whopping underrounds when an even money shot gets withdrawn at the stalls.

Do The Survey

If you have ever had an account closed or restricted, and I guess that’s just about everyone, (If you have not then you are doing something terribly wrong with your betting) then please do the HBF survey into the problem and maybe just maybe we can get something done. Please tweet, blog, wear a T shirt and anything it takes to get the word around

Account Restriction/Closure Survey

Which would be more lucrative to Racing and even to Bookmakers

1. Restrict and close all punters that beat prices and show any inkling of a brain and hence see the betting turnover shrink as existing punters turn away and new punters do not arrive. Kind of like a greater ROI but lower turnover for racing and bookies.

2. Allow an Aussie type system where winners allowed to certain individual bet liabilities. Winners now feel free to be more visible which in turn encourages new players into the sport who of course in the majority will lose. Net result lower ROI but much greater turnover = greater profit than option 1

In many ways the above scenario is rather like betting on Exchanges V betting with bookmakers. With Bookmakers you can make a greater ROI but you cannot get much on if anything at all. When you switch to bet on the Exchanges you have to learn that ROI will shrink so turnover must be vastly increased.

Oddslines a Line In The Sand

I have been busy this week not just watching Cheltenham but also working on a Bayesian set of flat handicap ratings or should I say a Bayesian version of my flat handicap ratings.What has prompted me in this direction are two things. First of all my ratings do well but less well towards the off time of races which is a pity because the greatest liquidity on Betfair exists the closer you get to the off. My reasoning therefore is to try and produce the most efficient oddsline with the minimum amount of time and effort and with Bayes producing natural percentage chances of winning it is hoped that this may prove a way of cranking up the action towards the off.

The second reason is that I have already had some success in this area and hence the title of this post. No I do not have a wonderful oddsline approach to AW racing as the title suggests. What I have done in a completely unrelated area to my ratings is produced a novel oddsline approach.  Now forgive me, for obvious reasons, if I do not spell out the nuts and bolts of this method, but I do feel there is a more general message which is worth passing on and could be of use to some of you.

My ‘novel oddsline’ approach initially did not generate profits, if memory serves me right I think it produced a small loss on Betfair. What I did next was simply to move the dividing line which specified bet or no bet. In other words where I had previously simply bet if the odds were greater than X I changed this to bet where the odds are greater than X + y%. I moved the line in the sand so to speak. This has now generated around +6% after commission profit off around 5,000 bets all under 10.0 on Betfair.

For older members of SmartSig this may well have a familiar ring to it. Some of you may recall an excellent article in the original SmartSig mag’ by an author called Filly. In this article Filly’s main point, if memory serves me well, was that just about any sensible approach to ranking or selecting horses could be made profitable by moving the threshold from a price point of view. At the time I was not sure whether I could buy into this. Some members were even wondering whether a forecast such as the Racing Post could be used and then, using an appropriate percentage above the forecast, a profit would ensue.

It seemed at the time to be pushing the boundaries of acceptance but the more experienced members of the group were more in the Filly camp of thinking than out. In fact I seemed to recall that it either won the article of the year award or came second.

I will keep you informed of progress on the Bayes approach but in the mean time remember not all fillies are bad bets.

If you recall the article or have some personal opinion on oddslines in general please leave a comment. All bloggers appreciate some sort of feedback.

The Humble KNN

I hope you got something out of the intro session on machine learning and Python. I think the KNN model is often used as a first sample of ML because it is fairly easy to get your head around what is happening under the bonnet. This begs the question, how important is it to know what is going on under the bonnet ?. Can we treat these ML algorithms as black boxes or do we need some understanding of the underlying mechanism ?.

Part of an answer to this question came when I progressed further with my ML investigation. I first thought that using a KNN method might be well suited to trainer patterns. Let’s face it we are always wondering if a certain trainer does well with certain characteristics of a horses profile and are we not always told that trainers are creatures of habit, repeating the same winning strategies time and again ?.

I did a run down of the trainers with the most prolific number of runners in handicaps. The first thing that struck me about this data was the U shaped curve it had in terms of losses blind betting their runners. In other words if you backed all runners, trainers with poor strike rates lost you more than medium strike rate trainers but it also became poor again when looking at high strike rate trainers. The upper levels are I presume so well known that the public simply overbet them. I decided therefore to select 10 trainers from the sweet zone in the middle who had the most runners in handicaps.

Using the same data from the previous exercise minus the trainer strike rate I discovered alas no significant profitable trends from the trainers. The model simply performed poorly. The exercise was not a complete waste of time however, if you are interested in focusing on the habits and run styles of a select few trainers I would suggest looking into the middle sweet spot. It may be that the trainers outside the top runner count are easier to churn a profit from.

One more green light went on in my head as a result of the KNN exercises and this proved far more promising, in fact that may prove to be a gross understatement.

I decided to take a look at the old chestnut of predicting when a price during the live 10 minute betting shows, is a price that will beat Betfair SP. I was looking at prices from UK flat handicaps taken every 7 seconds. The data used in the model was purely technical. Investopedia define technical analysis thus

A method of evaluating securities by analyzing statistics generated by market activity, such as past prices and volume. Technical analysts do not attempt to measure a security’s intrinsic value, but instead use charts and other tools to identify patterns that can suggest future activity.”

This is in contrast to functional analysis where external factors to the market are considered. An example in horse racing would be the jockey or trainer of a horse.

The good thing about this data in contrast to the previous sessions is that it was far more balanced in terms of outcomes. Around 47% of prices are inferior to betfair sp in the range of under 10.0 (which is where I focused).

With K set at 11 the model produced 131,474 selections in which 64.5% were correct in that the price taken beat or was equal to Betfair SP. This number of selections would be from approximately 3 weeks of racing. Yes it is a ,lot but the model will make multiple suggestions on the same horse in the same race if at each point it is deemed to be a good bet to beat sp.

This to my naive non trading eyes looked quite promising but things got better on two points. First of all I substituted Betfair SP with last live show, after all it is unlikely from a trading perspective one would trade to BFSP. You would have to guess the amount and be in danger of cannabalising your own SP. Also given previous reveltions about BFSP and the fact that around 40% of books to BFSP are under 100%, it seemed logical that final show would be a better proposition than BFSP. It proved to be the case but not by huge amounts. It added about 0.5% to the correct prediction score.

The second adjustment provided a more significant improvement. Using a Random Forest algorithm instead of the KNN upped the correct prediction rate to 69%

Random Forests are an ensemble method of modelling. They use decision trees to model the predictions but they use a number of trees and then aggregate the results from the individual trees to give a final result. This makes them far less susceptible to over fitting on the data. I also used the RF as a simple black box. Python is useful in this way in that you can plug in the alternative model into your existing code as shown below.

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 100)

rf.fit(data_train,winlose_train)

predicted = rf.predict(data_test)

The rest of the code is essentially the same.

I would welcome the thoughts of any traders out there on this topic.