the project has basically 3 parts -
- we created our own cluster in our lab with 7-8 computers . in which one was Name Node and the others were Data Node . -> 1. we first implemented HDFS . uploaded a file of around 4 GB in our cluster.
-> 2. we perform Map reduce . basically we run a program for word count on that file .
-> 3. we implemeted hive in which we created a database and perform ceratin operation in it.
-
we created our cluster using AWS instances . // at run time it asks user about the no of instances to launch and then make one of those instances Namenode and othe datanode.
-
we created our cluster using DOCKER.