Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,929 changes: 1,929 additions & 0 deletions HackForLA_WebScraping_BeautifulSoup.ipynb

Large diffs are not rendered by default.

2,908 changes: 2,908 additions & 0 deletions HackForLA_WebScraping_Reddit_API.ipynb

Large diffs are not rendered by default.

435 changes: 435 additions & 0 deletions HackForLA_WebScraping_selenium.ipynb

Large diffs are not rendered by default.

Binary file added Web_Scraping_Tutorial.docx
Binary file not shown.
1,929 changes: 1,929 additions & 0 deletions web-scraping/HackForLA_WebScraping_BeautifulSoup.ipynb

Large diffs are not rendered by default.

2,908 changes: 2,908 additions & 0 deletions web-scraping/HackForLA_WebScraping_Reddit_API.ipynb

Large diffs are not rendered by default.

435 changes: 435 additions & 0 deletions web-scraping/HackForLA_WebScraping_selenium.ipynb

Large diffs are not rendered by default.

87 changes: 87 additions & 0 deletions web-scraping/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# 1 Introduction

- What Is Web Scraping?

- When to Use Each Scraping Method

- Ethical & Legal Considerations (robots.txt, Terms of Service)

# 2 Prerequisites

- Python Basics

- Installing Required Libraries

- Working with Jupyter Notebooks

- Understanding HTML Structure

# 3 Method 1: Web Scraping with BeautifulSoup

- How Static Websites Work

- Sending HTTP Requests with requests

- Parsing HTML with BeautifulSoup

- Extracting Titles, Prices, Links

- Looping Through Multiple Items

- Converting Output to DataFrame/CSV

# 4 Method 2: Web Scraping with Selenium

- When to Use Selenium

- Launching a Browser Using WebDriver

- Understanding XPath

- Extracting Dynamic Content

- Interacting with Pages (clicking, scrolling)

- Capturing Data and Exporting Results

# 5 Method 3: Using APIs Instead of Scraping (Reddit API with PRAW)

- Why APIs Are Preferred Over Scraping

- Creating a Reddit Developer App

- Authenticating Using PRAW

- Extracting Subreddit Posts & Metadata

- Collecting Comments & Replies

- Exporting API Data

# 6 Data Storage & Cleaning

- Creating CSV Files

- Building DataFrames

- Handling Missing or Inconsistent Fields

# 7 Best Practices

- Avoiding Blocks & Rate Limits

- Choosing Between BeautifulSoup, Selenium, or API

- Error Handling and Logging

- Avoiding Anti-Scraping Protections

# 8 What You Can Build Next (Expand Your Skills)

- Applying Scraping in Real Projects

- Additional Tools & Advanced Techniques

- Next Steps in Data Collection & Automation

# 9 Contributor & Acknowledgements
Binary file added web-scraping/Web_Scraping_Tutorial.docx
Binary file not shown.