Skip to content

My learning and personal project to perform web automation and web scrapping

Notifications You must be signed in to change notification settings

3LexW/WebScrappingPractice

Repository files navigation

WebScrappingPractice

Storing codes for web scrapping practice using request package and beautifulSoup package in Python.

Practice 1. Obtain URL from Google

Please find file google.py for code

Practice 2. Extract all of the links from White House that point to the briefings and statements

Please find file whiteHouse.py for code

Practice 3. Obtain most of the races result from Hong Kong Jockey Club

Since the request package could not download any source code from the website, web automation tool "Selenium" is introduced, which is a web browser scripting package, mainly for scraping source code from the website. After the source code is found, BeautifulSoup is then used to obtain the information needed, which is every race result in every racing day available from the website. The information is then extracted to a csv file for further processing.

Please also download chromedriver.exe to the same folder where the code is placed, and install selenium to make the code work.

The running time of the script is longer than an hour since the code involves web browsing, therefore it is suggested to modify the code for a quicker test.

Please find file HKJCreadResult.py for code, and HKJCResults.csv for the result.

About

My learning and personal project to perform web automation and web scrapping

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages