Storing codes for web scrapping practice using request package and beautifulSoup package in Python.
Practice 1. Obtain URL from Google
Please find file google.py for code
Practice 2. Extract all of the links from White House that point to the briefings and statements
Please find file whiteHouse.py for code
Practice 3. Obtain most of the races result from Hong Kong Jockey Club
Since the request package could not download any source code from the website, web automation tool "Selenium" is introduced, which is a web browser scripting package, mainly for scraping source code from the website. After the source code is found, BeautifulSoup is then used to obtain the information needed, which is every race result in every racing day available from the website. The information is then extracted to a csv file for further processing.
Please also download chromedriver.exe to the same folder where the code is placed, and install selenium to make the code work.
The running time of the script is longer than an hour since the code involves web browsing, therefore it is suggested to modify the code for a quicker test.
Please find file HKJCreadResult.py for code, and HKJCResults.csv for the result.