Skip to content

GrandLay-e/company-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Company Scraper & Auto-Apply Bot

This project is a Python automation tool that scrapes company data from Welcome to the Jungle and can automatically apply to jobs using Selenium. It collects company information, saves it to JSON and SQLite, and automates job applications with a personalized cover letter.

Features

  • Scraping: Collects company data (name, domain, location, website, offers, etc.) from the Tech sector.
  • Data Storage: Saves data to both a JSON file and a SQLite database.
  • Filtering: Filters companies by location, domain, and whether they accept spontaneous applications.
  • Auto-Apply: Uses Selenium to log in and apply to filtered companies with a custom cover letter.
  • Logging: Logs all actions and errors to a log file.
  • Application Tracking: Keeps track of companies already applied to.

Project Structure

.
├── data/
│   ├── applied           # List of companies already applied to
│   └── data.json         # Scraped company data in JSON format
├── log/
│   └── process.log       # Log file for scraping and application process
├── src/
│   ├── apply.py          # Script for automated job applications
│   ├── Companies.py      # Companies collection class
│   ├── Company.py        # Company data model and persistence
│   ├── functions.py      # Scraping and utility functions
│   ├── IDS.py            # Credentials and cover letter template
│   ├── main.py           # Main entry point for scraping
│   ├── SELECTORS.py      # CSS selectors and constants
│   └── __pycache__/      # Python bytecode cache
├── .gitignore
└── README.md

Requirements

  • Python 3.10+
  • Google Chrome (for Selenium)
  • ChromeDriver (auto-managed)
  • pip

Python Packages

Install dependencies with:

pip install selenium webdriver-manager beautifulsoup4

Usage

1. Scrape Company Data (actually only tech sector)

Run the main scraper to collect company data:

python src/main.py
  • This will create at at first data and log folders if they don't exist.
  • Then will populate data/data.json and data/data.db with company information.

2. Auto-Apply to Companies

Warning: This will use your credentials (see src/IDS.py) to log in and apply to jobs.

python src/apply.py
  • The script logs in, filters companies (e.g., Paris, Logiciels, spontaneous application), and applies with a generated cover letter.
  • Applied companies are tracked in data/applied to avoid duplicates.

3. Configuration

Customization

  • Cover Letter: Edit the COVER_LETTER function in src/IDS.py to personalize your message.
  • Filtering: Change the filter logic in src/apply.py to target different locations, domains, or application types.

File Descriptions

Notes

  • Ethics: Use this tool responsibly. Automated applications may violate terms of service of some websites.
  • Security: Your credentials are stored in plain text in src/IDS.py. Do not share this file.
  • Maintenance: If the website structure changes, update the selectors in src/SELECTORS.py.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Releases

No releases published

Packages

No packages published

Languages