2YO Handicaps

Prompted by Matt Bisogno’s article on the up coming Nursery season I decided to have a look at a race category that I have not examined before. Matt’s excellent article had lots of food for thought, the only thing I questioned was that he had not split the data into train test partitions. This is something I think is not solely for Machine Learning applications but can also be applied to system building. So with this in mind I examined Nursery races from 2009 to 2015 inclusive using the first two years as a data run in. Any theories could then be tested on 2016/17.

My only port of call on this topic is trainer performance in these races and it may well uncover a classic example of Nick Mordin’s black is white and white is black in the world of betting.

Taking trainers strike rate in 2yo races generally and then also logging their strike rate in 2yo handicaps proved very interesting. At first I set the check to those runners where the trainer had a better SR in Nurseries than overall 2yo races. Prepare to enter Mordin world.

Bets 4866 PL -586 ROI% -12.04 VarPL -53 VROI% -7.18%

Now if we take a look at those trainers with an SR% less than their overall 2yo SR%

Bets 5326 PL +616 ROI +11.5 VarPL +54.3 VROI +6.82%

All the above are to BFSP before commission.

OK so how did this theory pan out on 2016 and 2017, well not too bad.

Bets 2155 PL +123 ROI 5.7% VPL +3.8 VROI +1.18%

Certainly this subset are worthy of further attention when deciding a bet. It would appear that punters overbet trainers with obvious good records in Nurseries whilst run of the mill records are ignored to heavily even amongst some big name trainers.

Examples of trainers to avoid would be S Bin Suroor and Roger Varian. They do well with their Nursery runners but the public seem to know that.

Trainers to watch out for in 2018 are

A B Haynes
A Berry
A Carson
A M Balding
A P Jarvis
A Stronge
Amy Murphy
Andrew Reid
Archie Watson
B Curley
B Haslam
B J Meehan
B Smart
B W Duke
C Allen
C G Cox
C Hills
C R Dore
C W Fairhurst
D Carroll
D Donovan
D J Coakley
D J S Ffrench Davis
D Kubler
D McCain Jnr
D Morris
D O’Meara
D O’meara
D P Quinn
D R C Elsworth
D Simcock
D W P Arbuthnot
Dr J D Scargill
E J Creighton
E J O’Neill
Ed De Giles
Eve Johnson Houghton
F J Brennan
G A Butler
G A Swinbank
Garry Moss
George Scott
H A McWilliams
H Palmer
H Spiller
Henry Spiller
Hugo Palmer
I Mohammed
I Semple
J Akehurst
J D Bethell
J G Portman
J Hetherton
J Howard Johnson
J Hughes
J J Quinn
J L Eyre
J Noseda
J Pearce
J Ryan
J W Mullins
J W Unett
K A Ryan
K Dalgleish
K R Burke
Kevin Frost
L McAteer
L Smyth
M A Jarvis
M Al Zarooni
M D I Usher
M Dods
M E Rimmer
M G Quinlan
M Johnston
M Murphy
M P Tregoning
M R Channon
M S Tuck
M Walford
Mark Gillard
Micky Hammond
Miss Gay Kelleway
Miss J R Tooth
Miss Jo Crowley
Miss L A Perratt
Mrs H S Main
Mrs I G-Leveque
Mrs K Burke
Mrs K Walton
Mrs L C Jewell
Mrs L J Mongan
Mrs L Stubbs
Mrs L Williamson
Mrs Marjorie Fife
Mrs S A Watt
N Wilson
O Stevens
Ollie Pears
P C Haslam
P Charalambous
P D Evans
P Hedger
P J Makin
P M Phelan
P McCreery
P R Chamings
P W Chapple-Hyam
Pat Eddery
Pat Morris
Patrick Morris
R A Fahey
R A Teal
R Brisland
R Curtis
R D Wylie
R Eddery
R G Fell
R Hannon
R Johnson
R M H Cowell
R M Whitaker
Richard Hannon
Richard Spencer
Robyn Brisland
S A Callaghan
S Dixon
S Durack
S Kirk
Sir H R A Cecil
Sir Michael Stoute
Stef Higgins
T B Coles
T D Barron
T D Walford
Tom Dascombe
W G Harrison
W G M Turner
W J Haggas
W J Knight
W J Musson
W Stone

Advertisements

What is a Good Price ?

I made the mistake or my partner made the mistake of arranging a lunch date with another couple at our home on the day of the England v Sweden quarter final. Luckily the guy was a football fan so we peeled off at 3pm for the football minus the cigars and snooker or is it billiards that gentlemen play?.
When we returned to announce that England were through to the semi finals which we never would have managed if it was not for Brexit, my partner got rather annoyed that I had not put that fiver on for her when England were 8/1 earlier in the competition, lamenting the fact that they were now around 11/4. Her friend mentioned that when they go racing she never backs the favourite which is always about 2/1, its just not worth it. Suddenly I saw an opportunity for some basic betting tuition and could not resist a quick insight into the error of her statement. I pointed out that if I was to offer her 2/1 on it not raining tomorrow (we are in the middle of a heat wave extending for at least another week)would she not rush to the cash till ?. The answer was no because for the amount she puts on it is not worth having a pound on to win two. I pointed out that her betting strategy is being governed by how much she might win and not what chance does she think she has of winning compared to the odds I am offering. I was expecting or hoping for a light bulb moment but it wasn’t to be, she still could not even bring herself to admit that it made perfect sense and maybe she should bet according to chance and not how much she may win.

What astounded me about this was here we had an educated person who simply could not grasp or maybe accept that her perspective was totally wrong. Her point of view is that she takes x pounds to the races and is prepared to lose it as part of the days cost. I would suggest that in the long run she is aiming to lose it. Part of the problem is that at school we are not taught to think in terms of chance and probabilities. We come away with a ‘it will happen’ or ‘it wont happen’ view of the world. Politicians suffer from this as much as anyone as highlighted in the excellent book Superforcasting by Philip Tetlock and Dan Gardner.

For four years I ran the UK’s most successful tipping line in terms of level stake profit and as voted by The Secret Betting club. During this period I told everyone to have more on all the odds on shots. I made this suggestion because they made each year between 12 and 20 points profit and furthermore being odds on a bookmaker was likely to take more. I suggested that this boost would pay for the service and hopefully a few changed their perspective on odds on shots as my suggestion was prompted by a complaint after an odds on shot got beat.

In my view there are three approaches to price. You are either good at spotting overpriced horses which is why I backed Saxon Warrior at 11/4 yesterday, beaten a head. The second approach is that you have an algorithmic based approach of producing an oddsline or the third is that you have created a model or system that you are confident consistently underbets the selections and you are also confident that this is unlikely to change.

Adopt one of them but forget about how much you are likely to win.

Sartin and UK Sectionals

In the mid 1980’s the Sartin methodology began to gather a lot of attention across the pond with both punters and journalists. The method revolves around essentially finding meaning within the rich source of sectional data available within US racing. Here in the UK we are only just starting to scratch the surface with Turftrax and Total Performance Data more recently starting to cover tracks.
The Sartin method initially has a focus on the four data items.

1st fraction
2nd fraction
final fraction
X factor

X factor is a calculation based on 1st fraction and final fraction but let’s not worry about that for now.

This article is not going to be pure Sartin but rather a quick look at sectional calls and how predictive they might be, having said that I will take a look at the X factor number although it is worth bearing in mind that Sartin does all his calculations using horse speed in feet per second. I would need to be a bit more confident about the measurements in the data before doing conversions but using actual times might still be informative.

I decided to look at a subset of data namely Wolverhampton 7f races. The approach centred around compiling averages or pars for first fraction, second fraction and final fraction calls. The Wolves 7f data alas starts at 4f which meant i decide to make the first call 4f, the second call 2f and the final call time obviously 2f to the finish. So just to recap we have a first section of 3f a mid section of 2f and a final section of 2f. These averages were based on horses winning races or getting beaten a head or less.

Now the first check I made was how did these winners do next time out if they were above par on the first section and above par on the final section. I am focusing on these sections as Sartin’s factor X revolves around them. Now I would expect them to do pretty badly being above par in both sections. By the way there is no allowance for class or conditions here yet, its very rough and ready.

Bets 16 PL +14.95 to BFSP

Now let’s consider those that ran below par for the first section and above par for the last.

Bets 50 PL +3.58 to BFSP

How about above par for the first section but below par for the last.

Bets 46 PL -9.77 to BFSP

Finally below par in both sections.

Bets 21 PL -18.05

Finally Sartin’s X factor is calculate by the simple formula

(1st fraction + final fraction) / 2

Calculating Pars for factor X and then a ratio for each winning horse via

ratio = horse X factor / Par X factor

We have for a ratio below 1

71 bets PL -14.47 to BFSP

For a ratio above 1

62 bets PL +5.18

The message, if there is one to be had, from the above figures, is that horses that have had tough races on the clock do worse next time out whereas winners who have had easy races on the clock fair better. This is just a hypothesis and would need testing on larger data.

But wait a minute, maybe we are just looking at the wrong fractions. I mean Tom Brohamer states that when a horse is top on the mid fraction and top on the final fraction it is time to loosen the betting belt. So how do those horses that win and beat par on these last two sections actually do next time out.

Bets 24 PL +11.14 to BFSP

None of this is pure Sartin or Brohamer but it does perhaps demonstrate that there is plenty of new rich veins of data coming online and those that ignore it need to be sectioned.

Field Size as a Short Cut

Hugh Taylor had a nice winner yesterday in Buccaneers Vault at 9/1 EP. Here is what he had to say about the horse.

“Dropped in from his wide draw at York, he was last turning for home, but made smooth headway towards the far rail in the straight. The race unfolded up the centre of the track, however, and although he pulled well clear of those who raced close to him, he was unable to land a blow.That was enough to suggest he’s in good form, and the return to 6f will suit”

Looking at the race on video his account is pretty much accurate and was there for all to see but not everybody saw it. Hugh has admitted that the cornerstone of his methods is looking for horses that ran better than the general public might interpret and therefore go off at bigger odds next time than they should. An approach everyone should try to emulate, but not everyone has the time of day to study videos of all yesterdays racing and make notes/alerts.

One way of short cutting the practice is to specialise in a particular distance. If you have a favoured area, say sprints then obviously choose that area. If you have another angle like pace bias over 8f then choose 8f races. This is an approach being used in an experiment currently being conducted by a group.

There may however be an additional filter which can be adopted to reduce the workload without reducing the accuracy and that is previous run field size. below is a list of the AE values for runners in handicaps having run in a handicap previously. The AE’s are constructed by number of race runners in previous race and the displayed values are for at least 100 sample races.

4 ran 1.01981618
5 ran 1.012712218
6 ran 0.954235887
7 ran 1.013166246
8 ran 0.944577488
9 ran 1.025891188
10 ran 1.01479714
11 ran 1.009386069
12 ran 0.999026866
13 ran 1.001438238
14 ran 1.029168591
15 ran 1.041550091
16 ran 1.016391039
17 ran 1.034566678
18 ran 0.971012869
19 ran 1.1053776
20 ran 1.093988019

What the graph shows is that runners LTO in 6 runner races have next time out an AE of just over 0.94. Anything above 1.0 is a sign that the public underbetting and that the horses are going off bigger than they should. The data is from 2009 to 2013 and shows that if we specialise in our video watching on runners of 14 or more we will not compromise our note taking value, perhaps even improve it, but we will cut down the number of races we have to study. There is a lot more going off in a large field as Mr Taylor discovered to his benefit.

Not everyone likes big fields to bet in but you should be studying them after the race.

Raw Sectionals

Sectional data provided by TPDZone is now available on ATR and it is providing a rich vein of analysis and fascination. Simon Rowlands is the guru on this topic but after some conversations with him it struck me that there is another way to look at the data which may prove interesting. The ATR approach is to look at sectional times, in particular the final section, as a ratio of the rest of the race. This proves a valuable insight into how energy was used through a race and and perhaps who won/lost because of or in spite of energy use. The problem is that it does not readily allow you to answer questions like who will lead between horse A who led at Wolves last time and horse B who led at Lingfield. The topology of the courses are so different that raw times run over the first part would be meaningless but times relative to actual PARS for that section and course might show that one has a better chance.
In addition assessing the class of horses is difficult via ATR, at a glance you cannot see whether horse A has just run in a class 6 but has posted a class 4 or 5 run.
To address this I compiled PAR times for each
course/sectional/going/class/distance
I then looked at a recent performance and tweeted that the winner compared to these PARS looked about class 4, the grade it was running in. My next step was to see if I could find a runner who had posted something above class for these pars. I did not have to look far. The first race I looked at over 7f at Wolves showed the class 6 runners leading through the first fraction in a slightly sub class 4 speed. Now given that the first 3 through this gate also finished in the first 3 suggested they may well be above class 6. Only one has run since winning ‘easily’ in class 6. The other two are yet to run. Comment below if you would like to know their names. Sorry to be a tease but I do get fed up with writing and getting little or no feedback.

Does a Good Big Un Beat a Good Little Un

There was an interesting post which I only picked up after the Derby from Simon Rowlands regarding the stride length of the Derby winner pre Derby run. At around 26+ it is very high, only 6.7% of horses have registered a stride length in excess of 26. There would have been two ways of looking at this pre Derby, firstly does it have any significance for him staying but perhaps more importantly will he handle the track. Many would have assumed no based on such a high stride length as its generally thought that the undulations of Epsom do not suit the big strider. This set me pondering over the TPD data and whether stride length as an indicator of size has any bearing on track preference. After digging around I felt that there is simply too little data yet to really start making predictions but what I felt I could do at this early stage in the emerging life cycle of TPD data is whether a big horse holds any betting advantage over small ones.

Calculating the average stride length of horses by track/distance/going/class then enabled me to compile a list of horses that posted a placed run that was greater than the average for the race they were running in. These would be deemed large horses although clearly those just above average are more average than large. Similarly those below average were deemed small.

How would we have got along betting them blindly in their following races.

Large horses produced 2783 bets and a ROI of -4.73% to BFSP
Small horses produced 3179 bets and a ROI of -8.9% to BFSP

This difference was more pronounced when looking at bet runs only in handicappers

Large horses ROI -1.73%
Small horses ROI -8.36%

That’s a huge difference if it holds up and one wonders if handicapped of the future may allocate penalties based on horse size ?.

Superforecasters

What makes a good forecaster, be it horse racing, political predictions, social movements or perhaps currency fluctuations ?. We probably all have an opinion on this one. Some would say intelligence, maybe IQ, they would be wrong. Being smart is no disadvantage but it is not the main driver behind the super forecasters out there. Perhaps it’s the men and women on TV ?. Almost certainly not, they are selected on the basis of how much air time they can accurately consume. The book I have almost finished attempts to shine a more objective light on what makes people good forecasters and seeks out those in the general public that fall into the category. Superforecasting: The Art and Science of Prediction.

The people in charge of this project simply advertised for volunteers (actually they got some gift vouchers at the end of the year) to become subjects in an experiment designed to find out who could become accurate forecasters and most important why they had such traits. Once the individuals had been tested before selection they were assigned periodically questions such as will the left or right party win the next Honduras election ?. What are the chances of Italy leaving the EU or defaulting on their debt. Members had to assign confidence levels to their answers and were allowed as time progressed to update their answers. Interestingly people who were diligent at updating tended to be the best forecasters when their objectively based scores were compiled.

It is a must read for anyone involved in forecasting and one of the most interesting points for me was when after a year and before they knew the ranking of forecasters, the people running the experiment decided to run groups. They randomly compiled groups to operate even thought they knew the dangers of group think and group fallout. Despite this fear the groups performed better than individuals and later when they compiled groups of super forecasters they too performed better than individual super forecasters, something I have found myself.

So what qualities make up a super forecaster and do they apply to horse betting ?.

1. Cautious – Nothing is certain, they are able to think in terms of percentages
2. Humble – Reality is infinitely complex
3. Nondeterministic – What happens is not meant to be and does not have to happen
4. Actively open minded – Beliefs are hypothesis to be tested, not treasures to be protected
5. Intelligent and Knowledgable With a Need For Cognition – Intellectually curious, enjoy puzzles.
6. Reflective – Introspective and self critical
7. Numerate – Comfortable with number

Within their forecasting they tend to be

8. Pragmatic – Not wedded to any ideas of agenda
9. Analytical – Capable fo stepping back and considering other views
10. Dragon Fly Eyed – Value a wide range of views which they then synthesize
11.Thoughtful updaters – When facts change they change their minds
12. Good Intuitive Psychologists – Aware of checking for personal biases

In their work they tend to be

13. Growth Mindset – Believe its possible to get better
14. Grit – Determined to keep at it however long it takes

I am sure you will tick a few of those as supremely relevant to horse betting. At the moment I am running a similar forecasting group in horse betting. Each member is assigned a specialist distance eg 5f and is asked to make selections to the group based on morning value prices. I hope to report back on this later in the year. By the way we have one vacancy in the group to cover 6f races.

Deep Learning and Horse Racing

Tags

Came back inspired and fascinated from the cinema the other day having sat with one other lone cinema goer watching AlphaGo.
Alphago is a deep learning program create by the company DeepMind to challenge the world champion Go player. Since the defeat of Kasparov, a world chess champion by a similar deep learning program the next mountain to climb was always Go. So far its proved elusive as the number of game permutations in Go make Chess look like noughts and crosses and it was thought that Go might be just too difficult for an AI program. If you get a chance you must see the documentary as it tracks the development, first beating the European champion and then the world champion. Even more interesting is the reaction of the huge crowd watching the event.

If you have read any of my other posts you will know that I have been impressed by the gains that seem to be real surrounding the machine learning algorithm Gradient Descent Boosting. This algorithm seems to be the defacto Kaggle competition winner at the moment. Kaggle, if you are not familiar, is a web site where data scientist hobbyists and pro’s take on submitted data sets and see who can produce the best Machine Learning solution. Inspired by Go I finally got around to checking out Deep Learning and was not surprised to find further gains. I tested three approaches on a simple data set consisting of just two features, namely horse age and days since last run. In all three cases I trained the models on two years of flat handicap data and tested them on one year of handicap data. Deep learning came out ahead of GDB which in turn beat Random Forests in terms of profit and loss of top rated.

If this topic would be of interest as perhaps a hands on tutorial then please leave a comment below. In the meantime probably the first thing you need to do if you want to get involved is to install Tensorflow and Keras. Keras is a front end built on top of Tensorflow and provides a simplified access to deep learning. You will need to have Anaconda Python installed which if you followed my earlier blog on Machine Learning you should have installed see here

https://markatsmartersig.wordpress.com/2016/01/13/profitable-punting-with-python-1/

Installing Tensorflow and Keras

First you need to create a new environment for your Keras based programs. Pull up a command box (type command in windows search box)

Assuming you have Anaconda installed enter the following command (not very clear in WordPress but that is a double dash before name shown below)

conda create –name deeplearning python

You can change deeplearning to whatever you’d like to call the environment. You’ll be prompted to install various dependencies throughout this process—just agree each time.

Let’s now enter this newly created virtual environment. Enter the following command

activate deeplearning

The command prompt should now be flanked by the name of the environment in parentheses—this indicates you’re inside the new environment.

We know need to install in this new environment any libraries we may need as they wont be accessible from the original root folder created when Anaconda was installed.

IPython and Jupyter are a must for those who rely on Jupyter notebooks for data science. Enter the following commands

conda install ipython
conda install jupyter

Pandas includes the de facto library for exploratory analysis and data wrangling in Python. Enter the following command

conda install pandas

SciPy is an exhaustive package for scientific computing, but the namesake library itself is a dependency for Keras. Enter the following

conda install scipy

Seaborn is a high-level visualization library. Enter the following

conda install seaborn

Scikit-learn contains the go-to library for machine learning tasks in Python outside of neural networks.

conda install scikit-learn

We’re finally equipped to install the deep learning libraries, TensorFlow and Keras. Neither library is officially available via a conda package (yet) so we’ll need to install them with pip. One more thing: this step installs TensorFlow with CPU support only and not GPU support. Enter the following

pip install –upgrade tensorflow
pip install –upgrade keras

Check all is OK

Get Jupyter Notebook up and running by entering

jupyter notebook

Once you are in notebook create a new notebook file and simply enter

from keras.models import Sequential
from keras.layers import Dense

Now run the above cell and hopefully all will be OK

Should you at any point wish to remove the new environment simply use the following command

conda remove –name deeplearning –all

That’s enough for now, if there is interest then we could perhaps explore the code sessions.

Just Think Then Did It (actually)

Just spent a sunny afternoon (that’s dedication for you) data munging and setting up the ML approach to pace comment analysis I outlined in the previous blog post. I was curious to see if an ML approach would categorize horses positions better than my pace program that I already use. To do this I used data from TPD which gives an accurate data source for horse positions within a race. Once I had mapped early position I could designate horses as Leaders. trackers, mid division or hold ups. The next task was to feed this data along with the corresponding race comments, 15844 total lines of data, to an ML analyser after first splitting into train and test splits. The algorithm was asked to learn what category a horse was based on its race comment from the training data. here is an example line or two.

led until over 1f out weakened final furlong,1
tracked leader led over 1f out headed inside final furlong kept on same pace,2

I then asked it to predict the positional number ie 1 to 4, on the comments held out in the remaining test data. It turned out that it did reasonably well with an accuracy of 69.5%. Well I say reasonably well, the truth is I have never put my own pace figures to the test against actual TPD data. This was the next step, checking my pace programs performance against the TPD data.

Across the whole test file my program came in at 68.2% accuracy. Not a huge difference between the ML algorithm but what was very useful about this exercise is that it allowed me to check how my pace program, and of course the figures on the SmarterSig site, do against the various categories of 1 to 4.

Predicting leaders it did OK at 69.6% accuracy
Predicting Trackers it came in at 81.5% accuracy
Predicting Mid div’s it achieved 44.7%
Predicting hold ups it managed 73.9%

This is very useful information as it allows me to reexamine the pace program and potentially fine tune it or possibly jump ship to the ML side of processing. The mid div’s seem to need a bit of TLC although we will always be a hostage to how race comment compilers write up.

Now time for beer before the sun go’s down

Just Think Then Do It (maybe)

It is a glorious Sunday morning and I am sat in the garden at 8am barefoot and earthing.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265077/

Some times it best to just stop and think and then when you have done that think a bit more. Of course Nike would like you to just do it but then again increasing their sales is dependant on your compulsion to buy.

While I was thinking I hit upon a rather simple and obvious idea that I had been pondering over the last few days. I produce Pace Figures daily on the SmarterSig web site and even if I say so myself, they are rather good. They should be I have tended them carefully over the last 10 years, tweeking them when ever I spot a miss-reading of a race comment. They form an integral part of my daily betting. I have however been pondering another recent interest of mine which is Machine Learning and sentiment analysis. Sentiment analysis is the adoption of machine learning techniques to analyse text and derive meaning. If you have read my earlier blog on this you will get the picture. What I am now pondering is could a machine learning approach produce better predictions of pace than my hard wired program approach. Furthermore it would also be capable of self updating or learning. For those of you not familiar with ML it will involve feeding the program as many examples of race comments as you can muster along with a an outcome for each comment. this outcome for example could be 1 for held up 2 for tracked and 3 for led or perhaps some other variation. Now the off putting aspect to this idea is that you would need to sit and watch a lot of races in order to really accurately tag each comment with the correct pace position. Now even bare foot in the garden such a task would guarantee a reversal of that Nike philosophy. However the rather obvious occurred to me while I wasn’t ‘just doing it’, the TPD tracking data gives me as accurate a track on race position as you can get. Marry this with race comments plug into the sentiment analysis and potentially thousands of race comments can be accurately machine trained. Once trained the system can then be used to predict future race pace. If TPD extend to all tracks then there would be no need for this and of course there is no guarantee that this will outperform my carefully hand crafted pace figure program but it sure as hell beats ‘just dont do it’.