This repository contains a Python-based web scraper designed to automate the extraction of business and professional data from Apollo. The scraper utilizes Selenium for dynamic web interaction and BeautifulSoup for parsing HTML, making it ideal for gathering information across multiple pages efficiently. The code is organized into a class structure for ease of use, scalability, and maintainability.
- Login Automation: Automatically logs in to Apollo with your credentials.
- Data Extraction: Scrapes essential information such as:
- Business Name
- Website
- Industry (Niche)
- Country
- First and Last Names
- Job Title
- Phone Number
- Personal and Company LinkedIn URLs
- Personal Email
- Multiple Pages Scraping: Configurable to scrape data from multiple pages.
- Data Export: Saves the scraped data to an Excel file (
.xlsxformat). - Duplicate Removal: Automatically removes duplicate entries from the final Excel file.
- Randomized User Agents: Uses a random user-agent header to prevent blocking and simulate human browsing behavior.
-
Clone the repository:
git clone https://github.com/your-username/apollo-web-scraper.git cd apollo-web-scraper -
Install required dependencies:
pip install -r requirements.txt
-
Download and set up the Chrome WebDriver that matches your Chrome version.
-
Update the following in the code:
-
Your Apollo login credentials (email and password).
-
The Apollo saved list URL you want to scrape.
-
The number of pages you want to scrape.
-
-
Modify the
scraper.pyfile with your Apollo credentials and target URL. -
Run the script:
python main.py
The script will log in to Apollo, scrape the data, and save it to an Excel file (complete_data.xlsx). It will also generate a cleaned file with duplicates removed (output_file.xlsx).