Skip to content

Advanced python crawler for moderm JavaScript-based wesites. Designed to extract data from dynamically loaded pages where classic HTML parsing is not enough.

License

Notifications You must be signed in to change notification settings

ShineXmRedT14/AsyncCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is Async Web Crawler: |____what can this project: |____1. It parse urls and get from they hrefs with absolute links. |____2. This code can render JavaScript sites (React, Vue, Angular) if text of response don't have any links. |____3. It can imitate human behavior what help with antibot in sites. |____4. all urls from crawler saved into domains.bd (sqlite) |____Dependencies: |____All dependencies in requirements.txt

How to start this Web Crawler: |____1. install all dependencies |____2. run main.py file

Features: |____1. You will can run code in cmd |____2. Add some optimization |____3. Upgrade Gui in Terminal (now gui in terminal bad)

author - ShineXmRedT14 LICENSE (MIT)

About

Advanced python crawler for moderm JavaScript-based wesites. Designed to extract data from dynamically loaded pages where classic HTML parsing is not enough.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages