Tags

This is the first in a series on web scraping using Python. I am going to assume that you have some basic programming skills,for example you know what a FOR loop is or an IF statement. If these terms do not mean anything to you then you probably need an introduction to basic programming in Python. I am also assuming that you are on a Windows based machine.

Web scraping is the process of gathering data from web pages and placing it into a convenient data form such as a .csv file
As an example we might want a .csv file (comma delimited flat file) of the days runners

2019-01-18,Chepstow 13:05,NH,Handicap Chase,5YO+,4,4.6,,433,Barbrook Star (IRE),etc,etc
etc
etc

Before we get stuck into the nitty gritty of scraping you are first going to need to download and install Python assuming you do not already have it. Don’t worry this is quite painless. I suggest installing Anaconda as your Python programming language source. If you do not already have Anaconda Python
then visit the following link and click on the 64 or 32 bit installer link depending on your machine. The installer will be downloaded
and you should then double click it to begin the installation process. The default settings that are offered to you should all be fine for
our purposes and the process should only take about 5 minutes.

https://www.anaconda.com/download/

Once everything has been installed check all is OK and the version of Python you have by firing up an MSDOS command window
(type cmd at the windows search engine) and then in this window type in

python –version

That’s a double dash by the way in the above

First Program

Let us finish this short first session off with our first program, the usual Hello World.
For now we will stick with rudimentary tools just to keep things simple. We will use Notepad or you can use Wordpad if you prefer.
Fire up Notepad (enter Notepad in windows search) and enter the following few lines of code (the identation of print(meesage) is important.

message = “Hello World”
if (“World” in message):
  print (message)

Save the file with the name HelloWorld.py

Now at the DOS window prompt where you checked the python version type in

python helloworld.py

You should get the Hello World message appearing on the screen.

Try changing the
if (“World” in message):
line to
if (“world” in message):
and save and run again

Notice when you run the program now it prints nothing out because lower case world is not contained in the message.

Notice also the indentation, Python will not allow the following statement, unlike other programming languages.

if (“World” in message):
print (message)

I am not going to deliver a full blown intro to Python during this series but I may mention things that are peculiar to Python just in case you have programmed in other languages but not Python.

OK that will do for now, in the next session we will start doing some actual web scraping