Fogbank Scripts

This repository contains scripts used for a computational cluster running Hadoop. Learn how to use these scripts in docs/script_usage.rst

The overall goal is to collect statistics while the Hadoop cluster executes tasks. These statistics should be about the network and each node's utilisation. This will then be used to infer information about how the tasks affect the nodes.

There are three components to help achieve this goal. These are:

Providing connectivity (and monitoring capabilities) between nodes
Creating a custom image that can be loaded onto each node
Automated collection of statistics when a task is run

1. Connectivity and monitoring

The nodes are connected by a SDN (Software Defined Networking) switch. The software for controlling the switch behaviour (the controller) is on the master node/control PC. Through an SDN protocol (OpenFlow), the switch monitors the network traffic and relays the information back to the controller.

A more detailed explanation can be found in docs/1_concept_explanation.rst. Installation instructions can be found in docs/1_installation.rst.

2. Custom image for nodes

To provide an easily scalable cluster, a custom image is created for slave nodes. The image runs Xubuntu and comes with Hadoop installed. A node is then able to boot the image from the PXE (Preboot eXecution Environment) server on the master node/control PC. The topology of the setup is described in docs/2_topology.rst.

Create image and PXE server

Instructions for creating a custom image and setting up the PXE server can be seen in docs/3_pxe_boot.rst.

Install Hadoop

Install Hadoop on the image (and on the master node) using the instructions from docs/4_1_single_node_hadoop.rst.

Running Hadoop on the cluster

Run Hadoop on the cluster by following docs/4_2_multi_node_hadoop.rst. Start the cluster up with instructions from docs/5_node_setup.rst

Install Hive and Tez on master node

Hive provides a way to manage databases on Hadoop. Tez processes data for Hive. Install them both on the master node by following docs/4_3_hive_tez.rst

3. Automated statistics collection

The automated collection of statistics during a task is done through the run_multiple_queries.py script. It runs a Hive query multiple times, and generates graphs from the statistics collected during the query.

More details can be found in docs/6_automated_queries.rst.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
docker		docker
docs		docs
.gitignore		.gitignore
README.rst		README.rst
check_hadoop.py		check_hadoop.py
check_openflow.py		check_openflow.py
check_slaves.py		check_slaves.py
clean.py		clean.py
copy_files.py		copy_files.py
cpu_ram_monitor.py		cpu_ram_monitor.py
cpu_ram_monitor_main.py		cpu_ram_monitor_main.py
delete.py		delete.py
generate_graphs.py		generate_graphs.py
get_hadoop_attributes.py		get_hadoop_attributes.py
kill_dfs.sh		kill_dfs.sh
modify_etc_host.py		modify_etc_host.py
node_ip_hostname.txt		node_ip_hostname.txt
of_switch_check.py		of_switch_check.py
run_dfs.py		run_dfs.py
run_multiple_queries.py		run_multiple_queries.py
ssh_key_copy.py		ssh_key_copy.py
start_monitor.py		start_monitor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fogbank Scripts

1. Connectivity and monitoring

2. Custom image for nodes

Create image and PXE server

Install Hadoop

Running Hadoop on the cluster

Install Hive and Tez on master node

3. Automated statistics collection

About

Uh oh!

Releases

Packages

Languages

libunamari/fogbank-scripts

Folders and files

Latest commit

History

Repository files navigation

Fogbank Scripts

1. Connectivity and monitoring

2. Custom image for nodes

Create image and PXE server

Install Hadoop

Running Hadoop on the cluster

Install Hive and Tez on master node

3. Automated statistics collection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages