Machine Learning has moved firmly into the are of sentiment analysis. The role of detecting whether written text carries greater or lesser traits of various underlying messages. Is the book review overall negative or positive, is the person happy or sad based on his/her writing. I could not help wondering whether sentiment analysis could be applied to the question ‘Does Hugh Taylor really fancy that’.
There are some standard Python libraries that can help with performing sentiment analysis. But before looking at those lets take a look at what gets churned out for two contrasting pieces of analysis. The first is Hugh Taylors tip today 5/3/2018 and the other is a more negative analysis of The New Ones chances in the World Hurdle at Cheltenham.
First of all here is Hugh’s write up for today
“Veteran STAMP DUTY doesn’t win very often and might struggle if given his normal hold-up ride in the first division of the extended 1m1f handicap at Wolverhampton (6.45), but he has shaped as if in good form in limited starts this winter and has a positive jockey booking, and he’s capable of going close if not inconvenienced by the run of the race.
He ran an excellent race here two outings ago from a wide draw, and again shaped well here last time when unsuited by the steady gallop. That form has been franked by the next-time-out wins of the second and third.
He ran very well for Luke Morris when runner-up behind an in-form favourite in October despite lacking a recent run, and although much will depend on whether he breaks well enough to take up a reasonable position in a race where there doesn’t look to be much pace, he might run well if getting the run of the race.”
Now using Python and the library textblob I ran a sentiment analysis on this piece and the output using a NaiveBayes analyzer was
Sentiment(Classification=’pos’, p_pos=0.99997, p_neg=2.5039e-05)
This means the text was positive, the pos value being close to 1. This immediately highlights room for improvement. Texblob uses a standard corpus that carries out sentiment analysis in a general form. What would be useful is a corpus that is geared towards horse racing a or indeed Hugh Taylor. Can we feed a machine learning algorithm multiple examples such as the above along with results and train a sentiment analyser that is far better than the above at highlighting positive and less positive selections ?.
Let us take a look at the output for the more negative analysis for The New One done by a different tipster.
“As a general rule I tend away from horses when they are trying something different at the back end of their careers. There is not really anything in his profile that can help us judge the chances of him seeing out this three-mile trip – rather like it was with Nicholls Canyon in 2017! For his connections sake, I hope it is the same outcome as it was for the ‘unproven’ Mullins runner last year. From a betting point of view I could not have him on my mind. This is not because I do not think he has a chance of winning as he certainly holds some sort of claim. The point with The New One is that he is such a popular horse that his current single figure price does not take account of the realistic possibility that he might not stay.
The New One does seem to be as good as ever judging by his five runs this season. Last time out Sam Twiston-Davies ensured that he made it a real test of stamina on heavy ground over two miles at Haydock. Being viewed as a stayer over two miles is a world away from lasting home over three miles. I hope he does have the requisite stamina for the trip as a victory for The New One in the Stayers’ Hurdle would be as popular as a win for Cue Card in the Ryanair Chase.”
Here the output reflects a more negative impression, although still giving an overall positive.
The p_neg has increased showing that the algorithm was capable of saying that this text is more negative than Hugh’s analysis.
Given that tipsters are never likely issue a tip such as
“This horse is a dog, I would love to take a gun and shoot it rather than back it”
We can therefore expect high pos values and overall positive categorization but with training better and more accurate predictions may be forthcoming.
By the way the output for the above was
Sentiment(Classification=’neg’, p_pos=0.37991, p_neg=0.62008)
If you would like to see the code behind this and install instructions please leave a comment.
UPDATE Tuesday 6/3/18
Can this approach have any positive effect, can it improve the bottom line to Hugh Taylor. Can it highlight which Taylor bets to lay once the mugs have almost done backing them at -20% of advised price. How more or less confident is Hugh with todays selections ?. The sentiment analysis on todays bet suggests that Hugh is indeed more bullish than he was about yesterdays loser. The analysis comes in at
p_pos =0.99999948 for Beaming
compared to yesterday
and for Mister Music he comes in at
p_pos = 0.99999474
So he would appear more positive about Beaming than Mister Music but more confident on both than yesterdays loser.
Perhaps with accumulated data averages can be derived which would enable a more accurate assessment. Of course an algorithm trained specifically on Hugh Taylor may be the best overall approach.