COMP 479 Crawler and Search Engine

Author: Alexander De Laurentiis

Running the server

source ./bin/activate uwsgi –http 127.0.0.1:8000 –master -p 4 -w server:app

Docker running

sudo docker build -t engine_image . sudo docker run –network host -p 8000:8000 engine_image or sudo docker -d run –network host -p 8000:8000 engine_image

Descriptionq

The goal of this assignment was to create a fully functional web crawler that could crawl a scalable and desirably large quantity of data without being limited by memory space. Then have this information pipe into an algorithm that would turn the data stream into an inverted index which could then be used by the front end of the project. The front end would be a search engine which uses the BM25 search algorithm to rank and score the matching results and order them upon retrieval to respond to the user’s search query.

Bullet Facts

Crawler and Query processor engine built in Python
Index is of first 10,000 pages from https://concordia.ca
Returns first 15 most relevant results after scoring
Uses the BM25 ranking algorithm
Crawler uses SPIMI to construct the inverted index
Records frequency of word per doc along with doc ID in the index
Crawler built from scratch using requests and urlparse libraries

Demo

https://alexanderdelaurentiis.com

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Images		Images
bin		bin
lib/python3.12/site-packages		lib/python3.12/site-packages
static		static
.gitignore		.gitignore
Dockerfile		Dockerfile
README.org		README.org
Report Proj 4.pdf		Report Proj 4.pdf
engine_479.service		engine_479.service
lib64		lib64
pyvenv.cfg		pyvenv.cfg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMP 479 Crawler and Search Engine

Running the server

Docker running

Descriptionq

Bullet Facts

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

COMP 479 Crawler and Search Engine

Running the server

Docker running

Descriptionq

Bullet Facts

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages