Skip to content

A repository for Big Data project conducted during VII semester of Bachelor Degree studies on Data Science, Warsaw University of TEchnology

Notifications You must be signed in to change notification settings

HubertR21/BigDataProject

Repository files navigation

BigDataProject

A repository for Big Data project conducted during VII semester of Bachelor Degree studies on Data Science, Warsaw University of TEchnology

To run the project one has to start with creating the following file: hadoop fs -mkdir /user/BDprojekt The next step is to run all the processors from the xml files for at least 5 minutes After that you have to run sparkito.py file In the end, to see the results one has to go thrugh ipynb file.

More detailed description for the testing pipeline is described in the ReportBigData file, which is in polish language.

Note that running on personal machine or VM might require loads of additional work to download right packages and configure everything properly.

To avoid that, we provide a link to google disk with our working VM: https://drive.google.com/drive/folders/1NGhgW72gCUh4ULpT34Dt97cxf3GwoLk-

About

A repository for Big Data project conducted during VII semester of Bachelor Degree studies on Data Science, Warsaw University of TEchnology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •