SDTask1

Introduction

Contains the files from Distributed Systems first task. The goal of this task is to understand and use communication models and middleware concepts and implement the MapReduce model.

MapReduce is a programming model and implementation to enable the parallel processing of huge amounts of data. In a nutshell, it breaks a large dataset into smaller chunks to be processed separately on different worker nodes and automatically gathers the results across the multiple nodes to return a single result.

As it name suggests, it allows for distributed processing of the map() and reduce() functional operations, which carry out most of the programming logic.

Here you can see the general process of MapReduce for counting the frequence of each word, what is known as Wordcount. Each map phase receives its input and prepares intermediary key as pairs of (key,value), where the key is the actual word and the value is the word's current frequency, namely 1. Shuffling phase guarantees that all pairs with the same key will serve as input for only one reducer, so in reduce phase we can very easily calculate the frequency of each word.

Configuration & Execution

To run the project, your system should have installed Python3, IBM Cloud Functions Client, and the following packages:

pip3 install boto3
pip3 install ibm-cos-sdk

Once this is done, we can proceed to run the following script in the project directory:

./startCF

Finally, we can run the application using this next line:

pyhton3 orchestrator.py DATASET NUMBER_OF_MAPS

Note: To run the project successfully, the user must have the IBM configuration file correctly edited to connect to his COS and CF.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
cos_backend.py		cos_backend.py
ibm_cf_connector.py		ibm_cf_connector.py
ibm_cloud_config		ibm_cloud_config
mapCountingWords.py		mapCountingWords.py
mapWordCount.py		mapWordCount.py
orchestrator.py		orchestrator.py
reduce.py		reduce.py
startCF		startCF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDTask1

Introduction

Configuration & Execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDTask1

Introduction

Configuration & Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages