Chrome Extensions Archive

The goal is to provide a complete archive of the chrome web store with version history.

You can see the current status of what's archived and download the files here: dam.io/chrome-extensions-archive/

Installing the extensions

To install an extension, go to chrome://chrome/extensions/ and drop the file.

To avoid the auto-update, load it as an unpacked extension

Files are named as .zip but they are the exact same .crx stored on the store.

Running the scripts

scripts are python 3.5+ only

Install dependencies: pip3 install -r req.txt

Create some folders and initialize some files:

mkdir data
mkdir crawled
mkdir crawled/sitemap
mkdir crawled/pages
mkdir crawled/crx
mkdir crawled/tmp
mkdir ../site
mkdir ../site/chrome-extensions-archive
mkdir ../site/chrome-extensions-archive/ext
echo "{}" > data/not_in_sitemap.json

Crawling:

crawl_sitemap.py: gets you the list of all the extensions in data/sitemap.json
crawl_crx.py: use data/sitemap.json to download the crx

Site & stats:

scan_pages_history_to_big_list.py: makes data/PAGES.json by scanning the pages you crawled
crx_stats.py: makes data/crx_stats.json (what's currently stored)
make_site.py: use data/crx_stats.json + data/PAGES.json to generate the site
make_json_site.py: data/crx_stats.json + data/PAGES.json to generate JSON

Then I serve the files directly with nginx (see nginx.conf file for example)

Helping out

I have a few things in mind for the future:

diff of extensions versions as a web interface
malware/adware analysis
running an alternative web store (better search, firefox support,...)

Don't hesitate to reach out (here on issues, damien@dam.io or @dam_io on twitter)

To propose changes, just do a PR. You can also discuss about things on gitter.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
extstats		extstats
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
TODO		TODO
crawl_crx.py		crawl_crx.py
crawl_infos.py		crawl_infos.py
crawl_sitemap.py		crawl_sitemap.py
cron.fish		cron.fish
crx_stats.py		crx_stats.py
extract_all.py		extract_all.py
make_site.py		make_site.py
removal_requests.py		removal_requests.py
req.txt		req.txt
scan_pages_history_to_big_list.py		scan_pages_history_to_big_list.py
source_server.py		source_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chrome Extensions Archive

Installing the extensions

Running the scripts

Helping out

About

Uh oh!

Releases

Packages

Languages

License

Bootz/chrome-extensions-archive

Folders and files

Latest commit

History

Repository files navigation

Chrome Extensions Archive

Installing the extensions

Running the scripts

Helping out

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages