diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 42274d2..4cac29d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -3,18 +3,16 @@ [Welcome](home.md) - [Installation & Set-up](./chapter1/getting-started.md) - - [GitHub](./chapter1/github.md) - [Windows](./chapter1/windows.md) - [Mac](./chapter1/mac.md) - [Linux](./chapter1/linux.md) - [WSL](./chapter1/wsl.md) - - [M3 MASSIVE]() - - [Nectar Cloud]() + - [M3 MASSIVE](./chapter1/m3.md) + - [Nectar Cloud](./chapter1/nectar.md) - [Challenges](./chapter1/challenges.md) - [Intro to C](./chapter2/intro-to-c.md) - - [Hello World](./chapter2/helloworld.md) - [Compilation](./chapter2/compilation.md) - [Types & Variables](./chapter2/vars.md) @@ -27,7 +25,6 @@ - [Challenges]() - [Operating Systems]() - - [Components of Linux]() - [Memory & IO]() - [Processes & Scheduling]() @@ -35,7 +32,6 @@ - [Inter-Process Communication]() - [More C]() - - [Pointers](./chapter2/pointers.md) - [Dynamic Memory](./chapter2/memory.md) - [Structures](./chapter2/structs.md) @@ -44,50 +40,26 @@ - [Spawning Processes & Threads]() - [Challenges](./chapter2/challenges.md) -- [M3 & SLURM](./chapter3/chapter3.md) +- [M3 & SLURM](./chapter5/chapter5.md) - - [Login - SSH & Strudel](./chapter3/login.md) - - [Batch vs. Stream Processing]() - - [Cluster Architectures]() - - [Schedmd's SLURM]() - - [M3 Interface & Usage]() - - [Job Scripting]() + - [Batch Processing vs. Cloud Computing](./chapter5/batch-cloud.md) + - [Parallel & Distributed Computing](./chapter5/parallel-distributed.md) + - [M3 Login - SSH & Strudel](./chapter5/login.md) + - [Schedmd's SLURM](./chapter5/slurm.md) + - [M3 Interface & Usage](./chapter5/m3-interface.md) + - [Job Scripting](./chapter5/job-scripting.md) - [Advanced SLURM]() - - [Challenges](./chapter3/challenges.md) - -- [Virtualisation & DevOps]() + - [Challenges](./chapter5/challenges.md) - - [VMs & Hypervisors]() - - [Containers]() - - [Virtual Environments]() - - [Application Deployments]() - - [DevOps]() - - [Configuration Management]() - -- [Networking]() - - - [TCP/IP Stack & OSI Model]() - - [Application Layer]() - - [Transport Layer]() - - [Internet & other Layers]() - - [Socket Programming]() - - [Cyber Security]() - -- [Intro to Parallel Computing]() - - - [Concurrent Execution]() - - [Driver-Worker Architecture]() - - [Global & Local Phases]() - - [Data Parallelism]() - - [Model Parallelism]() - - [Pipeline Parallelism]() - - [Synchronisation Issues]() - - [Deadlocks]() - - [Mutexes & Semaphores]() - - [Thread Safety]() +- [Introduction to Parallel Computing](./chapter8/chapter8.md) + - [Multithreading](./chapter8/multithreading.md) + - [Synchronisation](./chapter8/synchronisation.md) + - [Locks](./chapter8/locks.md) + - [Message Passing](./chapter8/message-passing.md) + - [Types of Parallelism](./chapter8/parallelism.md) + - [Challenges](./chapter8/challenges.md) - [Parallellisation of Algorithms]() - - [Distributed Databases]() - [Parallel Search]() - [Parallel Sort]() @@ -102,10 +74,10 @@ - [Job Batching](./chapter10/job-batching.md) - [Challenges](./chapter10/challenges.md) -- [Being a HPC Member]() +- [Being a HPC Member](./chapter11/chapter11.md) - - [Expectations & Leadership]() - - [Project Workflow]() - - [Academic Supervisors & Papers]() + - [Expectations & Leadership](./chapter11/expectations-leadership.md) + - [Project Workflow](./chapter11/project-workflow.md) + - [Academic Supervisors & Papers](./chapter11/supervisors-papers.md) [Acknowledgements](./acknowledgements.md) diff --git a/src/chapter1/aaf.png b/src/chapter1/aaf.png new file mode 100644 index 0000000..836a38d Binary files /dev/null and b/src/chapter1/aaf.png differ diff --git a/src/chapter1/hpcid.png b/src/chapter1/hpcid.png new file mode 100644 index 0000000..97ef6ae Binary files /dev/null and b/src/chapter1/hpcid.png differ diff --git a/src/chapter1/join_project.png b/src/chapter1/join_project.png new file mode 100644 index 0000000..f50e537 Binary files /dev/null and b/src/chapter1/join_project.png differ diff --git a/src/chapter3/start.md b/src/chapter1/m3.md similarity index 77% rename from src/chapter3/start.md rename to src/chapter1/m3.md index ab22ab8..226a963 100644 --- a/src/chapter3/start.md +++ b/src/chapter1/m3.md @@ -1,14 +1,19 @@ -# Getting Started +# M3 MASSIVE + +MASSIVE (Multi-modal Australian ScienceS Imaging and Visualisation Environment) is a HPC supercomputing cluster that you will have access to as an MDN member. In this page we will set you up with access before you learn how to use it in Chapter 5. Feel free to go through the docs to learn about the [hardware config](https://docs.massive.org.au/M3/m3users.html) of M3 (3rd version of MASSIVE) and it's [institutional governance](https://massive.org.au/about/about.html#governance). ## Request an account -In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this: +In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this: + -![HPC ID](./imgs/aaf.png) +![HPC ID](./aaf.png) Type in Monash, as you can see here. Select Monash University, and tick the Remember my organisation box down the bottom. Once you continue to your organisation, it will take you to the Monash Uni SSO login page. You will need to login with your Monash credentials. -You should now see something like this: ![HPC ID System](./imgs/hpcid.png) +You should now see something like this: + +![HPC ID System](./hpcid.png) Once you are here, there are a couple things you will need to do. The first, and most important is to set your HPC password. This is the password you will use to login to M3. To do this, go to home, then click on Change Linux Password. This will take you through the steps of setting your password. @@ -16,7 +21,9 @@ Once you have done this, you can move on to requesting access to the MDN project ## Add to project -To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this: ![Join Project](./imgs/join_project.png) +To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this: + +![Join Project](./join_project.png) In the text box type `vf38` and click search. This is the project code for MDN. Then select the project and click submit. You will now have to wait for the project admins to approve your request. Once they have done this, you will be able to access the project. This should not take longer than a few days, and you will get an email telling you when you have access. @@ -47,4 +54,4 @@ cat ~/.ssh/id_ed25519.pub Then, go to your github account, go to settings, and click on the SSH and GPG keys tab. Click on New SSH key, and paste the key into the box. Give it a name, and click Add SSH key. -You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning. +You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning. \ No newline at end of file diff --git a/src/chapter1/nectar-login.png b/src/chapter1/nectar-login.png new file mode 100644 index 0000000..8f4fed0 Binary files /dev/null and b/src/chapter1/nectar-login.png differ diff --git a/src/chapter1/nectar.md b/src/chapter1/nectar.md new file mode 100644 index 0000000..53624a1 --- /dev/null +++ b/src/chapter1/nectar.md @@ -0,0 +1,16 @@ +# Nectar Cloud + +The ARDC Nectar Research Cloud (Nectar) is Australia’s national research cloud, specifically designed for research computing. Like with M3, we will set you up with access now before you learn about it in later chapters. [This webpage](https://ardc.edu.au/services/ardc-nectar-research-cloud/) explains what it is if you're curious. + +## Connect Monash Account to Nectar Cloud +To create an [identity](https://medium.com/@ciente/identity-and-access-management-iam-in-cloud-computing-2777481525a4) (account) in Nectar Cloud, all you have to do is login using your Monash student account. Click [this link](https://dashboard.rc.nectar.org.au) to access Nectar's landing page. + +You will see the following. Make sure to click "Login via AAF (Australia)". + +![nectar](./nectar-login.png) + +You will be redirected to enter your Monash credentials after which you will see the Nectar Cloud dashboard for your trial project (your project name will be pt-xxxxx). + +## Cloud Starter Series + +ARDC has provided [this cloud starter tutorial series](https://tutorials.rc.nectar.org.au/cloud-starter/01-overview) for people new to Nectar Cloud. You should be able to follow these tutorials using your trial project. If you need more SUs (service units aka. cloud credits) in order to provision more cloud resources for MDN-related work, you should message your HPC Lead with that request. \ No newline at end of file diff --git a/src/chapter11/chapter11.md b/src/chapter11/chapter11.md new file mode 100644 index 0000000..ea51daa --- /dev/null +++ b/src/chapter11/chapter11.md @@ -0,0 +1,6 @@ +# Being a HPC Member + +Congratulations! You've completed all your technical new recruit training! + +At this point it's important to remember that technical skills are only a part of what's required to succeed as a HPC member. +Without good teamwork skills and collaboration practices we will not be able to work together effectively and achieve our goals. To that end, this chapter outlines some basic expectations required of all HPC members along with other non-technical information that you might find useful. \ No newline at end of file diff --git a/src/chapter11/expectations-leadership.md b/src/chapter11/expectations-leadership.md new file mode 100644 index 0000000..e6e1e14 --- /dev/null +++ b/src/chapter11/expectations-leadership.md @@ -0,0 +1 @@ +# Expectations & Leadership \ No newline at end of file diff --git a/src/chapter11/project-workflow.md b/src/chapter11/project-workflow.md new file mode 100644 index 0000000..1bd8c6c --- /dev/null +++ b/src/chapter11/project-workflow.md @@ -0,0 +1 @@ +# Project Workflow \ No newline at end of file diff --git a/src/chapter11/supervisors-papers.md b/src/chapter11/supervisors-papers.md new file mode 100644 index 0000000..49bf64d --- /dev/null +++ b/src/chapter11/supervisors-papers.md @@ -0,0 +1 @@ +# Academic Supervisors & Papers \ No newline at end of file diff --git a/src/chapter3/chapter3.md b/src/chapter3/chapter3.md deleted file mode 100644 index 16097dd..0000000 --- a/src/chapter3/chapter3.md +++ /dev/null @@ -1,7 +0,0 @@ -# M3 - -[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides as majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores. - -M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on. - -This book will take you through the basics of connecting to M3, submitting jobs, transferring data to and from the system and some other things. If you want to learn more about M3, you can read the [M3 documentation](https://docs.massive.org.au/M3/index.html). This will give you a more in-depth look at the system, and how to use it. diff --git a/src/chapter3/imgs/hpcid.png b/src/chapter3/imgs/hpcid.png deleted file mode 100644 index be747b6..0000000 Binary files a/src/chapter3/imgs/hpcid.png and /dev/null differ diff --git a/src/chapter3/imgs/join_project.png b/src/chapter3/imgs/join_project.png deleted file mode 100644 index 070d055..0000000 Binary files a/src/chapter3/imgs/join_project.png and /dev/null differ diff --git a/src/chapter3/linux-cmds.md b/src/chapter3/linux-cmds.md deleted file mode 100644 index 7057ccc..0000000 --- a/src/chapter3/linux-cmds.md +++ /dev/null @@ -1,47 +0,0 @@ -# Linux Commands - -Even if you are already familiar with linux, please read through all of these commands, as some are specific to M3. - -## Basic Linux Commands - -| Command | Function | -| --- | --- | -| `pwd` | prints current directory | -| `ls` | prints list of files / directories in current directory (add a `-a` to list everything, including hidden files/directories | -| `mkdir` | makes a directory | -| `rm ` | deletes *filename*. add `-r` to delete directory. add `-f` to force deletion (be really careful with that one) | -| `cd ` | move directory. | -| `vim` or `nano` | bring up a text editor | -| `cat ` | prints contents of file to terminal | -| `echo` | prints whatever you put after it | -| `chmod ` | changes permissions of file | -| `cp` | copy a file or directory| -| `mv ` | move or rename file or directory | - -> Note: `.` and `..` are special directories. `.` is the current directory, and `..` is the parent directory. These can be used when using any command that takes a directory as an argument. Similar to these, `~` is the home directory, and `/` is the root directory. For example, if you wanted to copy something from the parent directory to the home directory, you could do `cp ../ ~/`, without having to navigate anywhere. - -## Cluster Specific Commands - -| Command | Function | Flags -| --- | --- | --- | -| `show_job` | prints information about your jobs | -| `show_cluster` | prints information about the cluster | -| `user_info` | prints information about your account | -| `squeue` | prints information about your jobs | `-u ` to print information about a specific user | -| `sbatch ` | submit a job to the cluster | -| `scontrol show job ` | prints information about a specific job | -| `scancel ` | cancel a job | - -## M3 Specific Commands - -| Command | Function | -| --- | --- | -| `module load ` | load a module | -| `module unload ` | unload a module | -| `module avail` | list available modules | -| `module list` | list loaded modules | -| `module spider ` | search for a module | -| `module help ` | get help for a module | -| `module show ` | show details about a module | -| `module purge` | unload all modules | -| `module swap ` | swap two modules | \ No newline at end of file diff --git a/src/chapter3/shared-fs.md b/src/chapter3/shared-fs.md deleted file mode 100644 index f881310..0000000 --- a/src/chapter3/shared-fs.md +++ /dev/null @@ -1,79 +0,0 @@ -# M3's Shared Filesystem - -When we talk about a shared filesystem, what we mean is that the filesystem that M3 uses allows multiple users or systems to access, manage, and share files and directories over a network, concurrently. It enables users to collaborate on projects, share resources, and maintain a unified file structure across different machines and platforms. In addition to this, it enables the many different compute nodes in M3 to access data from a single source which users also have access to, simplifying the process of running jobs on M3. - -Very simply, the way it works is that the home, project and scratch directories are mounted on every node in the cluster, so they are accessible from any node. - -M3 has a unique filesystem consisting of three main important parts (for you). - -## Home Directory - -There is each user's personal directory, which only they have access to. This has a ~10GB allocation, and should store any hidden files, configuration files, or other files that you don't want to share with others. This is backed up nightly. - -## Project Directory - -This is the shared project directory, for all users in MDN to use. This has a ~1TB allocation, and should be used only for project specific files, scripts, and data. This is also backed up nightly, so in the case that you accidentally delete something important, it can be recovered. - -## Scratch Directory - -This is also shared with all users in MDN, and has more allocation (~3TB). You may use this for personal projects, but keep your usage low. In general it is used for temporary files, larger datasets, and should be used for any files that you don't need to keep for a long time. This is not backed up, so if you delete something, it's gone forever. - -## General Rules - -- Keep data usage to a minimum. If you have a large amount of data, consider moving it to the scratch directory. If it is not necessary to keep it, consider deleting it. -- Keep your home directory clean. -- In general, it is good practice to make a directory in the shared directory for yourself. Name this your username or name, to make it easily identifiable. This is where you should store your files for small projects or personal use. -- The project directory is not for personal use. Do not store files in the project directory that are not related to MDN. Use the scratch directory instead. - -## Copying files to and from M3 - -Copying files to and from M3 can be done in a few different ways. We will go over the basics of scp, as well as setting up FileZilla. - -A key thing to remember when copying files to and from M3 is that you shouldn't be using the regular ssh url. Instead, they have a dedicated SFTP url to use for file transfers. This is `m3-dtn.massive.org.au`. This is the url you will use when setting up FileZilla, and when using scp. - -### Using scp - -You can copy files to M3 using the `scp` command. This is a command line tool that is built into most linux distributions. If you are using Windows, you will need to install a tool like [Git Bash](https://gitforwindows.org/) to use this command. - -#### Linux / Mac - -To copy a file to M3, use the following command: - -```bash -scp @m3-dtn.massive.org.au: -``` - -For example, if I wanted to copy a file called `test.txt` to my home directory on M3, I would use the following command: - -```bash -scp test.txt jasparm@m3-dtn.massive.org.au:~ -``` - -To copy a file from M3 to your local machine, use the following command: - -```bash -scp @m3-dtn.massive.org.au: -``` - -So, to bring that same file back to my local machine, I would use the following command: - -```bash -scp jasparm@m3-dtn.massive.org.au:~/test.txt . -``` - -#### FileZilla - -FileZilla is a SFTP client that the M3 staff recommend using. You can download it [here](https://filezilla-project.org/download.php?show_all=1). - -Once installed, run the program and click on File -> Site Manager or `Ctrl-S`. This will open the site manager. Click on New Site, and enter the following details: - -- Protocol: SFTP -- Host: `m3-dtn.massive.org.au` -- Logon Type: Ask for password -- User: `` - -Don't change anything else. Leave password blank for now. - -It should look something like this: -![Add M3 as a site](./imgs/filezilla_connect_m3.png) -Click on Connect, and enter your password when prompted. You should now be connected to M3. You can now drag and drop files to and from M3. diff --git a/src/chapter3/software-tooling.md b/src/chapter3/software-tooling.md deleted file mode 100644 index a3ec5eb..0000000 --- a/src/chapter3/software-tooling.md +++ /dev/null @@ -1,113 +0,0 @@ -# Software and Tooling - -Software and development tooling is handled a little differently on M3 than you might be used to. In particular, because M3 is a shared file system, you do not have access to `sudo`, and you cannot install software on the system manually. Instead, you will need to use the `module` command to load software and development tools. - -## Module - -The `module` command is used kind of as an alternative to package managers like `apt` or `yum`, except it is managed by the M3 team. It allows you to load software and development tools into your environment, and is used to load software on M3. To see a comprehensive list of commands go [here](./linux-cmds.md#m3-specific-commands). - -In general, however, you will only really need to use `module load` and `module unload`. These commands are used to load and unload software and development tools into your environment. - -For most of the more popular software packages, like gcc, there are multiple different versions available. You will need to specify which version you want to load based on your needs. - -## C - -### GCC - -To load GCC, you can run the following command: - -```bash -module load gcc/10.2.0 -``` - -This will load GCC 10.2.0 into your environment, and you can use it to compile C/C++ programs as described in the [Intro to C](../chapter2/intro-to-c.md) chapter. To unload GCC, you can run the following command: - -```bash -module unload gcc/10.2.0 -``` - -## Python - -Python is a bit of a special case on M3. This is because of how many different versions there are, as well as how many different packages are available. To make things easier, it is recommended that you use miniconda or anaconda to manage your python environments instead of using the system python. - -These instructions are based off the M3 docs, which can be found [here](https://docs.massive.org.au/M3/software/pythonandconda/pythonandconda.html#pythonandconda). - -### Miniconda - -#### Installing Miniconda - -To install Miniconda on M3, there is a dedicated install script that you can use. This will install miniconda into your default scratch space, i.e. `/vf38_scratch//miniconda3`. To install miniconda, run the following command: - -```bash -module load conda-install - -# To install miniconda to the default location -conda-install - -# To install miniconda to a custom location -conda-install your/install/location -``` - -#### Activating Miniconda - -To activate the base conda environment, run the following command: - -```bash -source your/install/location/miniconda/bin/activate -``` - -You will notice that once activated, `(base)` will appear in the prompt before your username. - -To create and activate Python environments within Miniconda, follow these steps: - -```bash -# Create a new environment -# Change env-name to whatever you want to call your environment -conda create --name env-name python= - -# Activate the environment -conda activate env-name -``` - -#### Managing Python packages - -Use the following commands to install and manage Python packages: - -```bash -# Install a package -conda install package-name - -# Update a package -conda update package-name - -# You can also change the version of packages by adding a = and the version number - -# Remove a package -conda remove package-name -``` - -#### Deactivating Miniconda - -To deactivate the conda environment you are in, run `conda deactivate`. To exit conda entirely run `conda deactivate` again. You will know you have fully exited conda when `(base)` is no longer in the prompt. - -### VIM - -VIM is a terminal based text editor. You may have heard about it, or even tried using it before. If so, you might recognise the common meme of "how do I exit VIM???". This is because VIM uses a very different key binding system to other text editors, and it can be a little confusing to get used to. However, once you get used to it, it is actually a very powerful and efficient text editor. - -I will attemt to give a brief overview of VIM commands, however you should really check out the [VIM documentation](https://vimhelp.org/) if you want to learn more. - -VIM also has a built in tutorial that you can access by running `:help` while in VIM. - -To use VIM to edit a file, just type `vim ` into the terminal. This will open the file in VIM. If the file does not exist, it will create a new file with that name. - -VIM has three different modes. The first is the command mode, which is the default mode when you open a file. In this mode, you can navigate around the file, and perform other commands. The second is the insert mode, which is used to insert text into the file. The third is the visual mode, which is used to select text. - -To enter the insert mode, press `i`. To exit the insert mode, press `esc`. To enter the visual mode, press `v`. To exit the visual mode, press `esc`. - -In command mode, you move around using `h`, `j`, `k`, `l`. To move along words, press `w` or `b`. To move to the start or end of the line, press `0` or `$`. You can delete a line using `dd`, or delete a word using `dw`. You might be noticing some patterns here. In VIM, commands are made up of single or multiple characters that represent different things. For example, if I wanted to delete a word, I would press `d` to delete, and then `w` to delete a word. If I wanted to delete 3 words, I would press `d3w`. If I just wanted to change a word, I would press `c` instead of `d`. If I wanted to change 3 words, I would press `c3w`. If I wanted to change a line, I would press `cc`. Some other useful command mode commands are `u` to undo, `o` to insert a new line and go into insert mode, and `?` to search for a string. - -To get to insert mode, there are a lots of different ways, but the most common are `i` to insert text before the cursor, `a` to insert text after the cursor, and `o` to insert a new line. The capital versions of these also do things. `I` inserts text at the start of the line, `A` inserts text at the end of the line, and `O` inserts a new line above the current line. To exit insert mode, press `esc`. - -To get to visual mode, press `v`. In visual mode, you can select text using the same commands as in command mode. To delete the selected text, press `d`. To change the selected text, press `c`. To copy the selected text, press `y`. To paste press `p`. To exit visual mode, press `esc`. - -To exit VIM itself, enter command mode, and then press `:q!`. This will exit VIM without saving any changes. To save and exit, press `:wq`. To save without exiting, press `:w`. diff --git a/src/chapter3/strudel.md b/src/chapter3/strudel.md deleted file mode 100644 index 2b34a9f..0000000 --- a/src/chapter3/strudel.md +++ /dev/null @@ -1,31 +0,0 @@ -# Strudel - -STRUDEL is a web application used to connect to M3. There are two main benefits to this over regular ssh. Firstly, you are able to access a desktop session, so you can interact easier with M3, look at graphs, etc.. STRUDEL also enables the use of Jupyter notebooks, which are especially useful for data science and machine learning. - -## Accessing STRUDEL - -First, go to the [STRUDEL](https://beta.desktop.cvl.org.au/) website. You should see something like this: - -![strudel select cvl](imgs/strudel1.png) - -Select the CVL option, and you should be taken to another page, where you choose how to log in. - -![strudel login](imgs/strudel2.png) - -Select AAF. On the next page, search for and select Monash University. - -![AAF Login](imgs/aaf_strudel.png) - -You will now be taken to the Monash login page. Once you have logged in, it will show one last page, asking permission to use your details. Click allow, and you will be taken to the STRUDEL home page. - -![strudel home page](imgs/strudel_home.png) - -## Desktop Session - -To start a desktop session using STRUDEL, click on the **Desktop** tab on the side, select your desired options, and click launch. Once the session has started, you will be able to attach to it by clicking on the connect button in the *Pending / Running Desktops* section. - -## Jupyter Notebooks - -Similar to Desktops, if you want a basic Jupyter notebook, click on the **Jupyter Lab** tab, choose how much compute you want, and click launch. - -If you want to have a more customised Jupyter notebook, you can do this by first sshing into M3, and activate conda. Then activate the conda environment `jupyterlab`. Install you desired packages in this environment. Once you have done this, go back to STRUDEL, and launch a **Jupyter Lab - BYO** session. \ No newline at end of file diff --git a/src/chapter4/chapter4.md b/src/chapter4/chapter4.md deleted file mode 100644 index 00f7e45..0000000 --- a/src/chapter4/chapter4.md +++ /dev/null @@ -1,3 +0,0 @@ -# Parallel Computing - -In this chapter we discuss parallel computing and its uses in developing fast applications. We then look at how OpenMP allows us to parallelize or code to make it faster. diff --git a/src/chapter4/imgs/4 Parallel Computing OpenMP.gif b/src/chapter4/imgs/4 Parallel Computing OpenMP.gif deleted file mode 100644 index 1006c16..0000000 Binary files a/src/chapter4/imgs/4 Parallel Computing OpenMP.gif and /dev/null differ diff --git a/src/chapter4/imgs/Distributed Memory Architecture.png b/src/chapter4/imgs/Distributed Memory Architecture.png deleted file mode 100644 index 31b04a5..0000000 Binary files a/src/chapter4/imgs/Distributed Memory Architecture.png and /dev/null differ diff --git a/src/chapter4/imgs/Hybrid Parallel Programming.png b/src/chapter4/imgs/Hybrid Parallel Programming.png deleted file mode 100644 index 2de418b..0000000 Binary files a/src/chapter4/imgs/Hybrid Parallel Programming.png and /dev/null differ diff --git a/src/chapter4/imgs/Monte Carlo.png b/src/chapter4/imgs/Monte Carlo.png deleted file mode 100644 index 0b08413..0000000 Binary files a/src/chapter4/imgs/Monte Carlo.png and /dev/null differ diff --git a/src/chapter4/imgs/Parallel Computing Example.png b/src/chapter4/imgs/Parallel Computing Example.png deleted file mode 100644 index a5b662b..0000000 Binary files a/src/chapter4/imgs/Parallel Computing Example.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230325113147.png b/src/chapter4/imgs/Pasted image 20230325113147.png deleted file mode 100644 index ec3a879..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230325113147.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230325113254.png b/src/chapter4/imgs/Pasted image 20230325113254.png deleted file mode 100644 index 375a898..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230325113254.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230325113303.png b/src/chapter4/imgs/Pasted image 20230325113303.png deleted file mode 100644 index be44f48..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230325113303.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230325113312.png b/src/chapter4/imgs/Pasted image 20230325113312.png deleted file mode 100644 index 3c98d3e..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230325113312.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230325113329.png b/src/chapter4/imgs/Pasted image 20230325113329.png deleted file mode 100644 index 4fc23d5..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230325113329.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230326141615.png b/src/chapter4/imgs/Pasted image 20230326141615.png deleted file mode 100644 index feed862..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230326141615.png and /dev/null differ diff --git a/src/chapter4/imgs/Pasted image 20230326142826.png b/src/chapter4/imgs/Pasted image 20230326142826.png deleted file mode 100644 index 92186c1..0000000 Binary files a/src/chapter4/imgs/Pasted image 20230326142826.png and /dev/null differ diff --git a/src/chapter4/imgs/Running Processes in Parallel.png b/src/chapter4/imgs/Running Processes in Parallel.png deleted file mode 100644 index 8c8d66d..0000000 Binary files a/src/chapter4/imgs/Running Processes in Parallel.png and /dev/null differ diff --git a/src/chapter4/imgs/Shared Memory Architecture.png b/src/chapter4/imgs/Shared Memory Architecture.png deleted file mode 100644 index 1d20303..0000000 Binary files a/src/chapter4/imgs/Shared Memory Architecture.png and /dev/null differ diff --git a/src/chapter4/imgs/Slurm Architecture.png b/src/chapter4/imgs/Slurm Architecture.png deleted file mode 100644 index e3547ff..0000000 Binary files a/src/chapter4/imgs/Slurm Architecture.png and /dev/null differ diff --git a/src/chapter4/imgs/Thread vs Processes.png b/src/chapter4/imgs/Thread vs Processes.png deleted file mode 100644 index a1179ea..0000000 Binary files a/src/chapter4/imgs/Thread vs Processes.png and /dev/null differ diff --git a/src/chapter4/imgs/Time Command.png b/src/chapter4/imgs/Time Command.png deleted file mode 100644 index ed459a6..0000000 Binary files a/src/chapter4/imgs/Time Command.png and /dev/null differ diff --git a/src/chapter4/imgs/Top Command.png b/src/chapter4/imgs/Top Command.png deleted file mode 100644 index 713220b..0000000 Binary files a/src/chapter4/imgs/Top Command.png and /dev/null differ diff --git a/src/chapter4/imgs/sbatch Command.png b/src/chapter4/imgs/sbatch Command.png deleted file mode 100644 index feed862..0000000 Binary files a/src/chapter4/imgs/sbatch Command.png and /dev/null differ diff --git a/src/chapter4/imgs/show_cluster Command.png b/src/chapter4/imgs/show_cluster Command.png deleted file mode 100644 index a533f48..0000000 Binary files a/src/chapter4/imgs/show_cluster Command.png and /dev/null differ diff --git a/src/chapter4/imgs/squeue Command.png b/src/chapter4/imgs/squeue Command.png deleted file mode 100644 index 391a782..0000000 Binary files a/src/chapter4/imgs/squeue Command.png and /dev/null differ diff --git a/src/chapter4/multithreading.md b/src/chapter4/multithreading.md deleted file mode 100644 index ad0c189..0000000 --- a/src/chapter4/multithreading.md +++ /dev/null @@ -1,65 +0,0 @@ -# Multithreading on HPC - -## Thread vs Process - -![Thread vs Processes](imgs/Thread%20vs%20Processes.png) - -When computer runs a program, your source code is loaded into RAM and process is started. -A **process** is a collection of code, memory, data and other resources. -A process runs in a unique address space. So Two processes can not see each other’s memory. - -A **thread** is a sequence of code that is executed inside the scope of the **process**. You can (usually) have multiple **threads** executing concurrently within the same process. -**Threads** can view the memory (i.e. variables) of other threads within the same process - -A **multiprocessing** system has more than two processors, whereas **multithreading** is a program execution technique that allows a single process to have multiple code segments. - -## Architecture of a HPC Cluster (Massive) - -![Slurm Architecture](imgs/Slurm%20Architecture.png) - -The key in HPC is to write a parallel computing code that utilise multiple nodes at the same time. essentially, more computers faster your application - -## Using Massive - -### Find Available Partition - -Command: -```bash -show_cluster -``` - -![show_cluster Command](imgs/show_cluster%20Command.png) - -Before you run your job, it’s important to check the available resources. - -`show_cluster` is a good command to check the available resources such as CPU and Memory. Make sure to also check the status of the of the node, so that your jobs get started without waiting - -### Sending Jobs - -Command: -```bash -#SBATCH --flag=value -``` - -![sbatch Command](imgs/sbatch%20Command.png) - -Here is the example of shell script for running multi-threading job -`#sbatch` specifies resources and then it runs the executable named hello. - -`#sbatch` tasks specifies how many processes to run -Cpus per task is pretty self explanatory, it specifies how many cpu cores you need to run a process, this will be the number of threads used in the job -And make sure to specify which partition you are using - -### Monitor Jobs - -Command: -```bash -squeue -# or -squeue -u -``` - -![squeue Command](imgs/squeue%20Command.png) - -After you submitted your job, you can use the command squeue to monitor your job -you can see the status of your job to check whether it’s pending or running and also how long has it been since the job has started. diff --git a/src/chapter4/openmp.md b/src/chapter4/openmp.md deleted file mode 100644 index 9bcc36a..0000000 --- a/src/chapter4/openmp.md +++ /dev/null @@ -1,122 +0,0 @@ -# Parallel Computing with OpenMP - -## What is OpenMP - -OpenMP, stand for open multi-processing is an API for writing multithreaded applications - -It has a set of compiler directives and library routines for parallel applications, and it greatly simplifies writing multi-threaded code in Fortran, C and C++. - -Just few lines of additional code can make your application parallel  - -OpenMP uses shared memory architecture. It assumes all code runs on a single server - -## Threads - -![Threads Visualisation](imgs/Threads%20Visualisation.png) - -A thread of execution is the smallest instruction that can be managed independently by an operating system. - -In parallel region, multiple threads are spawned and utilises the cores on CPU - -> Only one thread exists in a serial region - -## OpenMP Compiler Directives - -Recall compiler directives in C; particularly the `#pragma` directive. These can be used to create custom functionality for a compiler and enable specialized features in-code. - -`#pragma` is a preprocessor directive that is used to provide additional information to the compiler beyond the standard language syntax. It allows programmers to give hints or directives to the compiler, which the compiler can use to optimize the code or to use specific compiler features or extensions. - -The `#pragma` directive is followed by a keyword that specifies the type of pragma and any additional parameters or options that are needed. For example, the `#pragma omp` directive is used in OpenMP parallel programming to provide hints to the compiler about how to parallelize code. Here are some examples of `#pragma` directives: -- `#pragma once`: This is a commonly used pragma in C and C++ header files to ensure that the header file is included only once in a compilation unit. This can help to prevent errors that can occur when the same header file is included multiple times. -- `#pragma message`: This pragma is used to emit a compiler message during compilation. This can be useful for providing additional information to the programmer or for debugging purposes. -- `#pragma warning`: This pragma is used to control compiler warnings. It can be used to turn specific warnings on or off, or to change the severity of warnings. -- `#pragma pack`: This pragma is used to control structure packing in C and C++. It can be used to specify the alignment of structure members, which can affect the size and layout of structures in memory. -- `#pragma optimize`: This pragma is used to control code optimization. It can be used to specify the level of optimization, or to turn off specific optimizations that may be causing problems. - -It is important to note that `#pragma` directives are compiler-specific, meaning that different compilers may interpret them differently or may not support certain directives at all. It is important to check the documentation for a specific compiler to understand how it interprets `#pragma` directives. - -OpenMP provides a set of `#pragma` directives that can be used to specify the parallelization of a particular loop or section of code. For example, the `#pragma omp parallel` directive is used to start a parallel region, where multiple threads can execute the code concurrently. The `#pragma omp for` directive is used to parallelize a loop, with each iteration of the loop being executed by a different thread. - -Here's an example of how `#pragma` directives can be used with OpenMP to parallelize a simple loop: - -```c -#include -#include - -int main() { - int i; - #pragma omp parallel for - for (i = 0; i < 10; i++) { - printf("Thread %d executing iteration %d\n", omp_get_thread_num(), i); - } - return 0; -} -``` - -Use `gcc -fopenmp` to compile your code when you use `#pragma` - -## Compile OpenMP - -1. Add `#include if you are using OpenMP function` -2. Run `gcc -fopenmp -o hello hello.c` - -## How it works - -![OpenMP and Directive](imgs/OpenMP%20and%20Directive.png) -[Source](https://www.researchgate.net/figure/OpenMP-API-The-master-thread-is-indicated-with-T-0-while-inside-the-parallel-region_fig3_329536624 -) - -Here is an example of `#pragma` -- The function starts with serial region -- At the line `#pragma omp parallel`, a group of threads are spawned to create parallel region inside the bracket -- At the end of the bracket, the program goes back to serial computing - -## Running "Hello World" on Multi-threads - ->If you're unsure about the difference between **multi-threading** and **multi-processing**, check the page [here](multithreading.md) - -**Drawing in Serial (Left) vs Parallel (Right)** -![](imgs/4%20Parallel%20Computing%20OpenMP.gif) - -Drawing in serial versus drawing in parallel, you can see how we can place one pixel at a time and take a long time to make the drawing, but on the right hand side if we choose to load and place four pixels down simultaneously we can get the picture faster, however during the execution it can be hard to make out what the final image will be, given we don’t know what pixel will be placed where in each execution step. - -Now this is obviously a fairly abstract analogy compared to exactly what’s happening under the hood, however if we go back to the slide diagram containing zones of multiple threads and serial zones, some parts of a program must be serial as if this program went further and drew a happy face and then a frown face, drawing both at the same time is not useful to the program, yes it would be drawn faster but the final image won’t make sense or achieve the goal of the program. - -## How many threads? You can dynamically change it - -**`omp_set_num_threads()` Library Function** -Value is set inside program. Need to recompile program to change - -**`OMP_NUM_THREADS` Environment Variable** - -```bash -export OMP_NUM_THREADS=4 -./hello -``` - -The operating system maps the threads to available hardware. You would not normally want to exceed the number of cores/processors available to you. - -## Measuring Performance - -The command `top` or `htop` looks into a process. As you can see from the image on right, it shows the CPU usages. - -![Top Command](imgs/Top%20Command.png) - -The command `time` checks the overall performance of the code. - -![Time Command](imgs/Time%20Command.png) - -By running this command, you get real time, user time and system time. - -**Real** is wall clock time - time from start to finish of the call. This includes the time of overhead - -**User** is the amount of CPU time spent outside the kernel within the process - -**Sys** is the amount of CPU time spent in the kernel within the process. -**User** time + **Sys** time will tell you how much actual CPU time your process used. - -## More Features of OpenMP - -- [YouTube Video: Introduction to OpenMP](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) -- [YouTube Video: Data environment -\#pragma omp parallel private](https://www.youtube.com/watch?v=dlrbD0mMMcQ&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=17) -- [YouTube Video: Parallel Loops - \#omp parallel for reduction()](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) diff --git a/src/chapter4/parallel-computing.md b/src/chapter4/parallel-computing.md deleted file mode 100644 index 92836b9..0000000 --- a/src/chapter4/parallel-computing.md +++ /dev/null @@ -1,41 +0,0 @@ -# Introduction to Parallel Computing - -## What is Parallel Computing? - -Parallel computing is about executing the instructions of the program simultaneously - -One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. - -In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order) - -A supercomputer is not just about fast processors. It is multiple processors working together in simultaneously. Therefore it makes sense to utilise parallel computing in the HPC environment, given the access to large numbers of processors - -![Running Processes in Parallel](imgs/Running%20Processes%20in%20Parallel.png) - -An example of parallel computing looks like this. - -![Parallel Computing Example](imgs/Parallel%20Computing%20Example.png) - -Here there is an array which contains numbers from 0 to 999. The program is to increment each values by 1. Comparing serial code on left and parallel code on right, parallel code is utilising 4 cores of a CPU. Therefore, it can expect approximately 4 times speed up from just using 1 core, what we are seeing here is how the same code can in-fact execute faster as four times as many elements can be updated in the same time one would be. - -## Parallel Computing Memory Architectures - -Parallel computing has various memory architectures - -### Shared Memory Architecture: - -There is shared memory architectures where multiple CPUs runs on the same server. OpenMP uses this model - -![Shared Memory Architecture](imgs/Shared%20Memory%20Architecture.png) - -### Distributed Memory Architecture: - -This distributed memory architecture where CPU and memory are bundled together and works by communicating with other nodes. Message passing protocol called lMPI is used in this model - -![Distributed Memory Architecture](imgs/Distributed%20Memory%20Architecture.png) - -### Hybrid Parallel Programming: - -For High Performance Computing (HPC) applications, OpenMP is combined with MPI. This is often referred to as Hybrid Parallel Programming. - -![Hybrid Parallel Programming](imgs/Hybrid%20Parallel%20Programming.png) \ No newline at end of file diff --git a/src/chapter5/batch-cloud.md b/src/chapter5/batch-cloud.md new file mode 100644 index 0000000..6919911 --- /dev/null +++ b/src/chapter5/batch-cloud.md @@ -0,0 +1,29 @@ +# Batch Processing vs. Cloud Computing + +You are all likely familiar with the definition of High Performance Computing. Here is one from IBM, + +> High-performance computing (HPC) is technology that uses clusters of powerful processors that work in parallel to process massive multi-dimensional data sets, also known as big data, and solve complex problems at extremely high speeds. HPC solves some of today’s most complex computing problems in real time. + +But the term HPC is not really used much outside the scientific research community. A lot of cloud systems involve similar scale of hardware, parallel & distributed computing, similar computational workload, data processing capacity and low latency/high throughput capability as HPC clusters. *So what exactly is the difference between a cloud system and a HPC cluster?* + +At the end of the day this comes down to semantics but a key difference is that a HPC cluster implies a system primarily used for **batch processing** whereas a cloud system would involve **interactive processing**. + +## Key Differences + +The vast majority of computer systems and nearly 100% of the ones that the average person uses is a cloud-based interactive system. Due to the nature of use cases specific to researchers, batch processing is a much more suitable choice for them. + +__Batch Processing:__ +- Jobs (code scripts) submitted are executed at a later time. +- User can't interact (or only limited interaction). +- Performance measure is **throughput**. +- Snapshot of output is used for debugging. + +![batch-image](./imgs/batch-processing.jpeg) + +__Interactive Processing:__ +- Jobs submitted are executed immediately. +- User can interact. +- Performance measure is **response time**. +- Interactive debugging. + +![interactive-image](./imgs/interactive-processing.png) \ No newline at end of file diff --git a/src/chapter5/challenges.md b/src/chapter5/challenges.md index e3c3cc5..39f5e4f 100644 --- a/src/chapter5/challenges.md +++ b/src/chapter5/challenges.md @@ -1,54 +1,45 @@ -# Distributed Computing Challenges +# M3 Challenges -## Overview +## Challenge 1 -- [Distributed Computing Challenges](#distributed-computing-challenges) - - [Overview](#overview) - - [Pre-Tasks](#pre-tasks) - - [Task 1 - Multinode 'Hello, world!'](#task-1---multinode-hello-world) - - [Task 2 - Ping Pong](#task-2---ping-pong) - - [Task 3 - Multinode Sum](#task-3---multinode-sum) - - [Task 4 - Multinode Mergesort](#task-4---multinode-mergesort) +Navigate to your scratch directory and, using vim (or your chosen in-terminal editor) create a file called `hello.txt` that contains the text "Hello World". Once you have created the file, use the `cat` command to print the contents of the file to the screen. -## Pre-Tasks +## Challenge 2 -For each task you will need to load MPICH using Spack from within your SLURM job script. There is a shared installation of Spack and MPICH within `vf38_scratch`. To load Spack and MPICH use the following to commands within you SLURM job script before any other command. +Write a bash script that prints the contents of the above hello.txt file to the screen and run it locally (on your login node). -```sh -. ~/vf38_scratch/spack/share/spack/setup-env.sh -spack load mpich -``` +## Challenge 3 -A template SLURM job file is given at the root of the distributed challenges directory. Copy this for each challenge into their respective sub-directories as every challenge will require running a SLURM job. If want to do some more experimenting, create multiple job scripts that use different amounts of nodes and test the execution time. +Submit the above script to the queue by writing another SLURM bash script. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from. -You will also need to generate some input for the sum and mergesort challenges. This can be done by compiling and running the program in `generate.cpp`. Run the following commands to build an generate the inputs for your challenges. +## Challenge 4 -```sh -module load gcc/10.2.0 -g++ -std=c++20 -o bin/generate generate.cpp -bin/generate 1000000000 -``` +Request an interactive node and attach to it. Once you have done this, install python 3.7 using conda. -> Note: -> -> - You do not have to worry about how to read the numbers from the file, this is handled for you already but it is recommended to look at the read function in `read.h` and understand what it is doing. -> - The expected output of the 'sum' challenge is found in the generated `output.txt` file within the challenges directory. -> The expected output of the 'mergesort' challenge is found in the generated `sorted.txt` file within the challenges directory however this will contain a lot of values so a check function is provided that compares a resorted version of your input to your sorted output. -> The sum and mergesort programs you will develop take a number as input. This is the size of the input data that you are performing your programs on. This should be the same number as the one used with the generator program. In the template programs for this challenge they are maked as an pointer to data called `input`. -> Given the above setup and configuration, the input data will contain ~8GB of data or ~8.0e9 bytes so make sure to allocate enough resources both in the programs an in the SLURM job scripts. +## Challenge 5 -## Task 1 - Multinode 'Hello, world!' +Clone and run [this](./dl_on_m3/alexnet_stl10.py) script. You will need to first install the dependencies for it. You don't need to wait for it to finish, just make sure it is working. You will know its working if it starts listing out the loss and accuracy for each epoch. You can stop it by pressing `ctrl + c`. -Your first task is to say 'Hello, world!' from different nodes on M3. This involves printing the nodes name, rank (ID) and the total number of nodes in the MPI environment. +Once you have confirmed that it is working, deactivate and delete the conda environment, and then end the interactive session. -## Task 2 - Ping Pong +> Hint: I have included the dependencies and their versions (make sure you install the right version) in the `requirements.txt` file. You will need python 3.7 to run this script. -For this next task you will play a Ping-Pong game of sorts between two nodes. This will involve passing a count between the two nodes and incrementing the count for each send and receive. This should increment the count to 10 in the end. +## Challenge 6 -## Task 3 - Multinode Sum +Go back to the login node. Now you are going to put it all together. Write a bash script that does the following: -Your next task is to sum the numbers in the generated `input.txt` file together across ten nodes. This will involve summing 1,000,000,000 floats together. The rough expected output is contained in the `output.txt` file. Remember the input array is already given in the template file. +- (1) requests a compute node +- (2) installs python using conda +- (3) clones and runs the above script -## Task 4 - Multinode Mergesort +Let this run fully. Check the output of the script to make sure it ran correctly. Does it match the output of the script you ran in challenge 5? +> Hint: You can check the output of the script at any time by `cat`ing the output file. The script does not need to have finished running for you to do this. -Your final task is to sort the numbers from the input file `unsorted.txt` using a distributed version of mergesort. This will involve ten nodes running their won mergesorts on chunks of the input data individually and then a final mergesort of the intermediate results. Remember the input array is already given in the template file. +## Challenge 7 + +Edit your submission script so that you get a gpu node, and run the script using the gpu. +> Hint: Use the m3h partition + +## Challenge 8 + +Now you want to clean up your working directory. First, push your solutions to your challenges repo. Then, delete the challenges directory, as well as the conda environment you created in challenge 6. diff --git a/src/chapter5/chapter5.md b/src/chapter5/chapter5.md index 4d82439..16097dd 100644 --- a/src/chapter5/chapter5.md +++ b/src/chapter5/chapter5.md @@ -1,7 +1,7 @@ -# Distributed Computing +# M3 -- [Refresher on Parallelism](parallel-refresher.md) -- [What is Distributed Computing](distributed-computing.md) -- [OpenMPI](openmpi.md) -- [Message Passing](message-passing.md) -- [Challenges](challenges.md) +[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides as majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores. + +M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on. + +This book will take you through the basics of connecting to M3, submitting jobs, transferring data to and from the system and some other things. If you want to learn more about M3, you can read the [M3 documentation](https://docs.massive.org.au/M3/index.html). This will give you a more in-depth look at the system, and how to use it. diff --git a/src/chapter5/distributed-computing.md b/src/chapter5/distributed-computing.md deleted file mode 100644 index 7aa688e..0000000 --- a/src/chapter5/distributed-computing.md +++ /dev/null @@ -1,44 +0,0 @@ -# What is Distributed Computing - -**Distributed computing is parallel execution on distributed memory architecture.** - -This essentially means it is a form of parallel computing, where the processing power is spread across multiple machines in a network rather than being contained within a single system. In this memory architecture, the problems are broken down into smaller parts, and each machine is assigned to work on a specific part. - -![distributed memory architecture](imgs/distributed_memory_architecture.png) - -## Distributed Memory Architecture - -Lets have a look at the distributed memory architecture in more details. - -- Each processor has its own local memory, with its own address space -- Data is shared via a communications network using a network protocol, e.g Transmission Control Protocol (TCP), Infiniband etc.. - -![Distributed Memory Architecture](imgs/distributed_memory_architecture_2.png) - -## Distributed vs Shared program execution - -The following diagram provides another way of looking at the differences between distributed and shared memory architecture and their program execution. - -![Distributed vs Shared](imgs/distributed_vs_shared.png) - -## Advantages of distributed computing - -There are number of benefits to distributed computing in particular it addresses some shortcomings of shared memory architecture. - -- No contention for shared memory since each machine has its own memory. Compare this to shared memory architecture where all the cpu's are sharing the same memory. -- Highly scalable as we can add more machines and are not limited by RAM. -- Effectively resulting in being able to handle large-scale problems - -The benefits above do not come without some drawbacks including network overhead. - -## Disadvantages of distributed computing - -- Network overload. Network can be overloaded by: - - Multiple small messages - - Very large data throughput - - Multiple all-to-all messages ($N^2$ growth of messages) -- Synchronization failures - - Deadlock (processes waiting for an input from another process that never comes) - - Livelock (even worse as it’s harder to detect. All processes shuffling data around but not progressing in the algorithm ) -- More complex software architecture design. - - Can also be combined with threading-technologies as openMP/pthreads for optimal performance. diff --git a/src/chapter3/imgs/aaf.png b/src/chapter5/imgs/aaf.png similarity index 100% rename from src/chapter3/imgs/aaf.png rename to src/chapter5/imgs/aaf.png diff --git a/src/chapter3/imgs/aaf_strudel.png b/src/chapter5/imgs/aaf_strudel.png similarity index 100% rename from src/chapter3/imgs/aaf_strudel.png rename to src/chapter5/imgs/aaf_strudel.png diff --git a/src/chapter3/imgs/auth_strudel.png b/src/chapter5/imgs/auth_strudel.png similarity index 100% rename from src/chapter3/imgs/auth_strudel.png rename to src/chapter5/imgs/auth_strudel.png diff --git a/src/chapter5/imgs/batch-processing.jpeg b/src/chapter5/imgs/batch-processing.jpeg new file mode 100644 index 0000000..b6eb6c9 Binary files /dev/null and b/src/chapter5/imgs/batch-processing.jpeg differ diff --git a/src/chapter3/imgs/filezilla_connect_m3.png b/src/chapter5/imgs/filezilla_connect_m3.png similarity index 100% rename from src/chapter3/imgs/filezilla_connect_m3.png rename to src/chapter5/imgs/filezilla_connect_m3.png diff --git a/src/chapter3/imgs/filezilla_sitemanager.png b/src/chapter5/imgs/filezilla_sitemanager.png similarity index 100% rename from src/chapter3/imgs/filezilla_sitemanager.png rename to src/chapter5/imgs/filezilla_sitemanager.png diff --git a/src/chapter3/imgs/gurobi.png b/src/chapter5/imgs/gurobi.png similarity index 100% rename from src/chapter3/imgs/gurobi.png rename to src/chapter5/imgs/gurobi.png diff --git a/src/chapter3/imgs/gurobi2.png b/src/chapter5/imgs/gurobi2.png similarity index 100% rename from src/chapter3/imgs/gurobi2.png rename to src/chapter5/imgs/gurobi2.png diff --git a/src/chapter5/imgs/interactive-processing.png b/src/chapter5/imgs/interactive-processing.png new file mode 100644 index 0000000..fdfb2e9 Binary files /dev/null and b/src/chapter5/imgs/interactive-processing.png differ diff --git a/src/chapter5/imgs/parallel-distributed.png b/src/chapter5/imgs/parallel-distributed.png new file mode 100644 index 0000000..2c7b8c2 Binary files /dev/null and b/src/chapter5/imgs/parallel-distributed.png differ diff --git a/src/chapter3/imgs/putty_key_not_cached.png b/src/chapter5/imgs/putty_key_not_cached.png similarity index 100% rename from src/chapter3/imgs/putty_key_not_cached.png rename to src/chapter5/imgs/putty_key_not_cached.png diff --git a/src/chapter3/imgs/putty_start.png b/src/chapter5/imgs/putty_start.png similarity index 100% rename from src/chapter3/imgs/putty_start.png rename to src/chapter5/imgs/putty_start.png diff --git a/src/chapter3/imgs/strudel1.png b/src/chapter5/imgs/strudel1.png similarity index 100% rename from src/chapter3/imgs/strudel1.png rename to src/chapter5/imgs/strudel1.png diff --git a/src/chapter3/imgs/strudel2.png b/src/chapter5/imgs/strudel2.png similarity index 100% rename from src/chapter3/imgs/strudel2.png rename to src/chapter5/imgs/strudel2.png diff --git a/src/chapter3/imgs/strudel_home.png b/src/chapter5/imgs/strudel_home.png similarity index 100% rename from src/chapter3/imgs/strudel_home.png rename to src/chapter5/imgs/strudel_home.png diff --git a/src/chapter3/bash.md b/src/chapter5/job-scripting.md similarity index 98% rename from src/chapter3/bash.md rename to src/chapter5/job-scripting.md index aada975..ad84c9e 100644 --- a/src/chapter3/bash.md +++ b/src/chapter5/job-scripting.md @@ -1,4 +1,6 @@ -# Bash Scripts +# Job Scripting + +## Bash Scripts Bash is both a command line interface and a scripting language. Linux commands are generally using Bash. Bash scripts are a series of commands that are executed in order. Bash scripts are useful for automating tasks that you do often, or for running a series of commands that you don't want to type out every time. In our case, Bash scripts are used for running jobs on M3. diff --git a/src/chapter3/login.md b/src/chapter5/login.md similarity index 65% rename from src/chapter3/login.md rename to src/chapter5/login.md index c1b91e5..0d92da9 100644 --- a/src/chapter3/login.md +++ b/src/chapter5/login.md @@ -78,3 +78,35 @@ a ticket for your issue. ``` Once you are done and want to logout, just type `exit`. This will close the connection. + +# Strudel + +STRUDEL is a web application used to connect to M3. There are two main benefits to this over regular ssh. Firstly, you are able to access a desktop session, so you can interact easier with M3, look at graphs, etc.. STRUDEL also enables the use of Jupyter notebooks, which are especially useful for data science and machine learning. + +## Accessing STRUDEL + +First, go to the [STRUDEL](https://beta.desktop.cvl.org.au/) website. You should see something like this: + +![strudel select cvl](imgs/strudel1.png) + +Select the CVL option, and you should be taken to another page, where you choose how to log in. + +![strudel login](imgs/strudel2.png) + +Select AAF. On the next page, search for and select Monash University. + +![AAF Login](imgs/aaf_strudel.png) + +You will now be taken to the Monash login page. Once you have logged in, it will show one last page, asking permission to use your details. Click allow, and you will be taken to the STRUDEL home page. + +![strudel home page](imgs/strudel_home.png) + +## Desktop Session + +To start a desktop session using STRUDEL, click on the **Desktop** tab on the side, select your desired options, and click launch. Once the session has started, you will be able to attach to it by clicking on the connect button in the *Pending / Running Desktops* section. + +## Jupyter Notebooks + +Similar to Desktops, if you want a basic Jupyter notebook, click on the **Jupyter Lab** tab, choose how much compute you want, and click launch. + +If you want to have a more customised Jupyter notebook, you can do this by first sshing into M3, and activate conda. Then activate the conda environment `jupyterlab`. Install you desired packages in this environment. Once you have done this, go back to STRUDEL, and launch a **Jupyter Lab - BYO** session. \ No newline at end of file diff --git a/src/chapter5/m3-interface.md b/src/chapter5/m3-interface.md new file mode 100644 index 0000000..8cfe317 --- /dev/null +++ b/src/chapter5/m3-interface.md @@ -0,0 +1,243 @@ +# M3 Interface & Usage + +## Linux Commands + +Even if you are already familiar with linux, please read through all of these commands, as some are specific to M3. + +### Basic Linux Commands + +| Command | Function | +| --- | --- | +| `pwd` | prints current directory | +| `ls` | prints list of files / directories in current directory (add a `-a` to list everything, including hidden files/directories | +| `mkdir` | makes a directory | +| `rm ` | deletes *filename*. add `-r` to delete directory. add `-f` to force deletion (be really careful with that one) | +| `cd ` | move directory. | +| `vim` or `nano` | bring up a text editor | +| `cat ` | prints contents of file to terminal | +| `echo` | prints whatever you put after it | +| `chmod ` | changes permissions of file | +| `cp` | copy a file or directory| +| `mv ` | move or rename file or directory | + +> Note: `.` and `..` are special directories. `.` is the current directory, and `..` is the parent directory. These can be used when using any command that takes a directory as an argument. Similar to these, `~` is the home directory, and `/` is the root directory. For example, if you wanted to copy something from the parent directory to the home directory, you could do `cp ../ ~/`, without having to navigate anywhere. + +### Cluster Specific Commands + +| Command | Function | Flags +| --- | --- | --- | +| `show_job` | prints information about your jobs | +| `show_cluster` | prints information about the cluster | +| `user_info` | prints information about your account | +| `squeue` | prints information about your jobs | `-u ` to print information about a specific user | +| `sbatch ` | submit a job to the cluster | +| `scontrol show job ` | prints information about a specific job | +| `scancel ` | cancel a job | + +## M3 Specific Commands + +| Command | Function | +| --- | --- | +| `module load ` | load a module | +| `module unload ` | unload a module | +| `module avail` | list available modules | +| `module list` | list loaded modules | +| `module spider ` | search for a module | +| `module help ` | get help for a module | +| `module show ` | show details about a module | +| `module purge` | unload all modules | +| `module swap ` | swap two modules | + +## M3's Shared Filesystem + +When we talk about a shared filesystem, what we mean is that the filesystem that M3 uses allows multiple users or systems to access, manage, and share files and directories over a network, concurrently. It enables users to collaborate on projects, share resources, and maintain a unified file structure across different machines and platforms. In addition to this, it enables the many different compute nodes in M3 to access data from a single source which users also have access to, simplifying the process of running jobs on M3. + +Very simply, the way it works is that the home, project and scratch directories are mounted on every node in the cluster, so they are accessible from any node. + +M3 has a unique filesystem consisting of three main important parts (for you). + +### Home Directory + +There is each user's personal directory, which only they have access to. This has a ~10GB allocation, and should store any hidden files, configuration files, or other files that you don't want to share with others. This is backed up nightly. + +### Project Directory + +This is the shared project directory, for all users in MDN to use. This has a ~1TB allocation, and should be used only for project specific files, scripts, and data. This is also backed up nightly, so in the case that you accidentally delete something important, it can be recovered. + +### Scratch Directory + +This is also shared with all users in MDN, and has more allocation (~3TB). You may use this for personal projects, but keep your usage low. In general it is used for temporary files, larger datasets, and should be used for any files that you don't need to keep for a long time. This is not backed up, so if you delete something, it's gone forever. + +### General Rules + +- Keep data usage to a minimum. If you have a large amount of data, consider moving it to the scratch directory. If it is not necessary to keep it, consider deleting it. +- Keep your home directory clean. +- In general, it is good practice to make a directory in the shared directory for yourself. Name this your username or name, to make it easily identifiable. This is where you should store your files for small projects or personal use. +- The project directory is not for personal use. Do not store files in the project directory that are not related to MDN. Use the scratch directory instead. + +### Copying files to and from M3 + +Copying files to and from M3 can be done in a few different ways. We will go over the basics of scp, as well as setting up FileZilla. + +A key thing to remember when copying files to and from M3 is that you shouldn't be using the regular ssh url. Instead, they have a dedicated SFTP url to use for file transfers. This is `m3-dtn.massive.org.au`. This is the url you will use when setting up FileZilla, and when using scp. + +#### Using scp + +You can copy files to M3 using the `scp` command. This is a command line tool that is built into most linux distributions. If you are using Windows, you will need to install a tool like [Git Bash](https://gitforwindows.org/) to use this command. + +##### Linux / Mac + +To copy a file to M3, use the following command: + +```bash +scp @m3-dtn.massive.org.au: +``` + +For example, if I wanted to copy a file called `test.txt` to my home directory on M3, I would use the following command: + +```bash +scp test.txt jasparm@m3-dtn.massive.org.au:~ +``` + +To copy a file from M3 to your local machine, use the following command: + +```bash +scp @m3-dtn.massive.org.au: +``` + +So, to bring that same file back to my local machine, I would use the following command: + +```bash +scp jasparm@m3-dtn.massive.org.au:~/test.txt . +``` + +#### FileZilla + +FileZilla is a SFTP client that the M3 staff recommend using. You can download it [here](https://filezilla-project.org/download.php?show_all=1). + +Once installed, run the program and click on File -> Site Manager or `Ctrl-S`. This will open the site manager. Click on New Site, and enter the following details: + +- Protocol: SFTP +- Host: `m3-dtn.massive.org.au` +- Logon Type: Ask for password +- User: `` + +Don't change anything else. Leave password blank for now. + +It should look something like this: +![Add M3 as a site](./imgs/filezilla_connect_m3.png) +Click on Connect, and enter your password when prompted. You should now be connected to M3. You can now drag and drop files to and from M3. + +## Software and Tooling + +Software and development tooling is handled a little differently on M3 than you might be used to. In particular, because M3 is a shared file system, you do not have access to `sudo`, and you cannot install software on the system manually. Instead, you will need to use the `module` command to load software and development tools. + +### Module + +The `module` command is used kind of as an alternative to package managers like `apt` or `yum`, except it is managed by the M3 team. It allows you to load software and development tools into your environment, and is used to load software on M3. To see a comprehensive list of commands go [here](./linux-cmds.md#m3-specific-commands). + +In general, however, you will only really need to use `module load` and `module unload`. These commands are used to load and unload software and development tools into your environment. + +For most of the more popular software packages, like gcc, there are multiple different versions available. You will need to specify which version you want to load based on your needs. + +## C + +### GCC + +To load GCC, you can run the following command: + +```bash +module load gcc/10.2.0 +``` + +This will load GCC 10.2.0 into your environment, and you can use it to compile C/C++ programs as described in the [Intro to C](../chapter2/intro-to-c.md) chapter. To unload GCC, you can run the following command: + +```bash +module unload gcc/10.2.0 +``` + +## Python + +Python is a bit of a special case on M3. This is because of how many different versions there are, as well as how many different packages are available. To make things easier, it is recommended that you use miniconda or anaconda to manage your python environments instead of using the system python. + +These instructions are based off the M3 docs, which can be found [here](https://docs.massive.org.au/M3/software/pythonandconda/pythonandconda.html#pythonandconda). + +### Miniconda + +#### Installing Miniconda + +To install Miniconda on M3, there is a dedicated install script that you can use. This will install miniconda into your default scratch space, i.e. `/vf38_scratch//miniconda3`. To install miniconda, run the following command: + +```bash +module load conda-install + +# To install miniconda to the default location +conda-install + +# To install miniconda to a custom location +conda-install your/install/location +``` + +#### Activating Miniconda + +To activate the base conda environment, run the following command: + +```bash +source your/install/location/miniconda/bin/activate +``` + +You will notice that once activated, `(base)` will appear in the prompt before your username. + +To create and activate Python environments within Miniconda, follow these steps: + +```bash +# Create a new environment +# Change env-name to whatever you want to call your environment +conda create --name env-name python= + +# Activate the environment +conda activate env-name +``` + +#### Managing Python packages + +Use the following commands to install and manage Python packages: + +```bash +# Install a package +conda install package-name + +# Update a package +conda update package-name + +# You can also change the version of packages by adding a = and the version number + +# Remove a package +conda remove package-name +``` + +#### Deactivating Miniconda + +To deactivate the conda environment you are in, run `conda deactivate`. To exit conda entirely run `conda deactivate` again. You will know you have fully exited conda when `(base)` is no longer in the prompt. + +### VIM + +VIM is a terminal based text editor. You may have heard about it, or even tried using it before. If so, you might recognise the common meme of "how do I exit VIM???". This is because VIM uses a very different key binding system to other text editors, and it can be a little confusing to get used to. However, once you get used to it, it is actually a very powerful and efficient text editor. + +I will attemt to give a brief overview of VIM commands, however you should really check out the [VIM documentation](https://vimhelp.org/) if you want to learn more. + +VIM also has a built in tutorial that you can access by running `:help` while in VIM. + +To use VIM to edit a file, just type `vim ` into the terminal. This will open the file in VIM. If the file does not exist, it will create a new file with that name. + +VIM has three different modes. The first is the command mode, which is the default mode when you open a file. In this mode, you can navigate around the file, and perform other commands. The second is the insert mode, which is used to insert text into the file. The third is the visual mode, which is used to select text. + +To enter the insert mode, press `i`. To exit the insert mode, press `esc`. To enter the visual mode, press `v`. To exit the visual mode, press `esc`. + +In command mode, you move around using `h`, `j`, `k`, `l`. To move along words, press `w` or `b`. To move to the start or end of the line, press `0` or `$`. You can delete a line using `dd`, or delete a word using `dw`. You might be noticing some patterns here. In VIM, commands are made up of single or multiple characters that represent different things. For example, if I wanted to delete a word, I would press `d` to delete, and then `w` to delete a word. If I wanted to delete 3 words, I would press `d3w`. If I just wanted to change a word, I would press `c` instead of `d`. If I wanted to change 3 words, I would press `c3w`. If I wanted to change a line, I would press `cc`. Some other useful command mode commands are `u` to undo, `o` to insert a new line and go into insert mode, and `?` to search for a string. + +To get to insert mode, there are a lots of different ways, but the most common are `i` to insert text before the cursor, `a` to insert text after the cursor, and `o` to insert a new line. The capital versions of these also do things. `I` inserts text at the start of the line, `A` inserts text at the end of the line, and `O` inserts a new line above the current line. To exit insert mode, press `esc`. + +To get to visual mode, press `v`. In visual mode, you can select text using the same commands as in command mode. To delete the selected text, press `d`. To change the selected text, press `c`. To copy the selected text, press `y`. To paste press `p`. To exit visual mode, press `esc`. + +To exit VIM itself, enter command mode, and then press `:q!`. This will exit VIM without saving any changes. To save and exit, press `:wq`. To save without exiting, press `:w`. diff --git a/src/chapter5/message-passing.md b/src/chapter5/message-passing.md deleted file mode 100644 index f6d8742..0000000 --- a/src/chapter5/message-passing.md +++ /dev/null @@ -1,11 +0,0 @@ -# Message Passing - -As each processor has its own local memory with its own address space in distributed computing, we need a way to communicate between the processes and share data. Message passing is the mechanism of exchanging data across processes. Each process can communicate with one or more other processes by sending messages over a network. - -The MPI (message passing interface) in OpenMPI is a communication protocol standard defining message passing between processors in distributed environments and are implemented by different groups with the main goals being high performance, scalability, and portability. - -OpenMPI is one implementation of the MPI standard. It consists of a set of headers library functions that you call from your program. i.e. C, C++, Fortran etc. - -For C, you will need a header file for all the functions (mpi.h) and link in the relevant library functions. This is all handled by the mpicc program (or your compiler if you wanted to specify all the paths). - -In the next chapter we will look at how to implement message passing using OpenMPI. diff --git a/src/chapter5/parallel-distributed.md b/src/chapter5/parallel-distributed.md new file mode 100644 index 0000000..3f66be4 --- /dev/null +++ b/src/chapter5/parallel-distributed.md @@ -0,0 +1,32 @@ +# Parallel & Distributed Computing + +Nearly all modern computer systems utilise parallel computing to speed up the execution of algorithms. To see how this works in practice look at the diagram below. + +![parallel vs. distributed](imgs/parallel-distributed.png) + +As you can see, in a scenario where a program (job) takes 3 seconds and 3 independent jobs have to be executed by a system, doing it serially in a single computer takes a total of 9 seconds. But doing it simultaneously across 3 computers will only take 3 seconds thus achieving a 3x speedup through parallel computing. + +This is the fundamental principle that High Performance Computing is based on. + +## What is Distributed Computing? + +**Distributed computing is parallel execution on distributed memory architecture.** + +This essentially means it is a form of parallel computing, where the processing power is spread across multiple machines in a network rather than being contained within a single system. In this memory architecture, the problems are broken down into smaller parts, and each machine is assigned to work on a specific part. + +![distributed memory architecture](imgs/distributed_memory_architecture.png) + +### Distributed Memory Architecture + +Lets have a look at the distributed memory architecture in more details. + +- Each processor has its own local memory, with its own address space +- Data is shared via a communications network using a network protocol, e.g Transmission Control Protocol (TCP), Infiniband etc.. + +![Distributed Memory Architecture](imgs/distributed_memory_architecture_2.png) + +Each machine or **node** is connected to the HPC cluster via a network, typically one with high bandwidth and low latency. The fact that these are largely independent computers connected over a network rather than a set of CPU/GPU cores in the same computer (in parallel computing), presents a set of disadvantages. + +__Advantages of parallel & local computing:__ +- No **data transfer latency** & I/O throughput bottleneck. The system bus inside a machine has incredibly higher bandwidth and lower latency compared to even the fastest computer networks. +- \ No newline at end of file diff --git a/src/chapter5/parallel-refresher.md b/src/chapter5/parallel-refresher.md deleted file mode 100644 index c4dbc3c..0000000 --- a/src/chapter5/parallel-refresher.md +++ /dev/null @@ -1,31 +0,0 @@ -# Refresher on Parallelism - -## Task Parallelism - -We saw in the last chapter parallel computing can be used to solve problems by executing code in parallel as opposed to in series. - -![Task parallelism](imgs/task_parallelism.jpg) - -## Data Parallelism - -Note that not all programs can be broken down into independent tasks and we might instead data parallelism like the following. - -![Data parallelism](imgs/data_parallelism.jpg) - -## Parallel computing example - -Think back to the example below which was provided in the last chapter. We will look at the cost of memory transactions soon. - -![Parallel computing example](imgs/parallel_computing_arrays_eg.png) - -## Parallel Scalability - -The speed up achieved from parallelism is dictated by your algorithm. Notably the serial bits of your algorithm can not be sped up by increasing the number of processors. The diagram below looks at the benefits we can achieve from writing parallel code as the number of processes increases. - -![Parallel scalability](imgs/parallel_scalability.jpg) - -## Memory Architectures - -Lastly, the different memory architectures we looked at in the last section included shared memory, distributed memory and hybrid architectures. We have looked at shared memory in detail and now we will dive into distributed memory architecture. - -![Memory architectures](imgs/memory_architectures.jpg) diff --git a/src/chapter3/slurm.md b/src/chapter5/slurm.md similarity index 100% rename from src/chapter3/slurm.md rename to src/chapter5/slurm.md diff --git a/src/chapter4/challenges.md b/src/chapter8/challenges.md similarity index 82% rename from src/chapter4/challenges.md rename to src/chapter8/challenges.md index 33b071c..0d9c4ce 100644 --- a/src/chapter4/challenges.md +++ b/src/chapter8/challenges.md @@ -1,16 +1,5 @@ # Parallel Computing Challenges -## Overview - -- [Parallel Computing Challenges](#parallel-computing-challenges) - - [Overview](#overview) - - [Pre-Tasks](#pre-tasks) - - [Task 1 - Single Cluster Job using OpenMP](#task-1---single-cluster-job-using-openmp) - - [Task 2 - Parallel `for` Loop](#task-2---parallel-for-loop) - - [Task 3 - Parallel Reductions](#task-3---parallel-reductions) - - [Task 4 - Laplace Equation for Calculating the Temperature of a Square Plane](#task-4---laplace-equation-for-calculating-the-temperature-of-a-square-plane) - - [Task 5 - Calculate Pi using "Monte Carlo Algorithm"](#task-5---calculate-pi-using-monte-carlo-algorithm) - ## Pre-Tasks Make sure to clone a copy of **your** challenges repo onto M3, ideally in a personal folder on vf38_scratch. @@ -19,7 +8,7 @@ Make sure to clone a copy of **your** challenges repo onto M3, ideally in a pers ## Task 1 - Single Cluster Job using OpenMP -Create a program in `hello.c` that prints 'Hello, world from thread: ' to the output. Launch the job to a node SLURM. +Create a program in `hello.c` that prints 'Hello, world from thread: ' to the output. Launch the job to a node SLURM. Next, extend the program to run on multi-nodes using OpenMPI. > Note: > @@ -28,7 +17,7 @@ Create a program in `hello.c` that prints 'Hello, world from thread: Hint: You will likely need to allocate memory from the heap. diff --git a/src/chapter8/chapter8.md b/src/chapter8/chapter8.md new file mode 100644 index 0000000..95c1d02 --- /dev/null +++ b/src/chapter8/chapter8.md @@ -0,0 +1,7 @@ +# Parallel Computing + +In this chapter, we will discuss the abstraction of parallel computing. To facilitate our exploration, we will employ a API within the C Programming Language: OpenMP. This tool will serve as a means to concretely illustrate the underlying language-independent theory. + +**Parallel computing is about executing the instructions of the program simultaneously.** + +One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order). diff --git a/src/chapter8/imgs/barrier-end.png b/src/chapter8/imgs/barrier-end.png new file mode 100755 index 0000000..e4b54a7 Binary files /dev/null and b/src/chapter8/imgs/barrier-end.png differ diff --git a/src/chapter8/imgs/barrier-wait.png b/src/chapter8/imgs/barrier-wait.png new file mode 100755 index 0000000..fd3f02c Binary files /dev/null and b/src/chapter8/imgs/barrier-wait.png differ diff --git a/src/chapter8/imgs/barrier.png b/src/chapter8/imgs/barrier.png new file mode 100755 index 0000000..10b3c87 Binary files /dev/null and b/src/chapter8/imgs/barrier.png differ diff --git a/src/chapter8/imgs/deadlock.png b/src/chapter8/imgs/deadlock.png new file mode 100644 index 0000000..6a8bd30 Binary files /dev/null and b/src/chapter8/imgs/deadlock.png differ diff --git a/src/chapter8/imgs/explicit-barrier.png b/src/chapter8/imgs/explicit-barrier.png new file mode 100755 index 0000000..0fb836a Binary files /dev/null and b/src/chapter8/imgs/explicit-barrier.png differ diff --git a/src/chapter4/imgs/Threads Visualisation.png b/src/chapter8/imgs/fork-join.png similarity index 100% rename from src/chapter4/imgs/Threads Visualisation.png rename to src/chapter8/imgs/fork-join.png diff --git a/src/chapter8/imgs/htop.png b/src/chapter8/imgs/htop.png new file mode 100644 index 0000000..cbc1fd3 Binary files /dev/null and b/src/chapter8/imgs/htop.png differ diff --git a/src/chapter8/imgs/mpi-routines.png b/src/chapter8/imgs/mpi-routines.png new file mode 100644 index 0000000..20768d6 Binary files /dev/null and b/src/chapter8/imgs/mpi-routines.png differ diff --git a/src/chapter8/imgs/one-thread-counter.png b/src/chapter8/imgs/one-thread-counter.png new file mode 100644 index 0000000..0386ee1 Binary files /dev/null and b/src/chapter8/imgs/one-thread-counter.png differ diff --git a/src/chapter4/imgs/OpenMP and Directive.png b/src/chapter8/imgs/program-structure.png similarity index 100% rename from src/chapter4/imgs/OpenMP and Directive.png rename to src/chapter8/imgs/program-structure.png diff --git a/src/chapter8/imgs/time.png b/src/chapter8/imgs/time.png new file mode 100644 index 0000000..b9f5185 Binary files /dev/null and b/src/chapter8/imgs/time.png differ diff --git a/src/chapter8/imgs/two-threads-counter.png b/src/chapter8/imgs/two-threads-counter.png new file mode 100644 index 0000000..e83f3fd Binary files /dev/null and b/src/chapter8/imgs/two-threads-counter.png differ diff --git a/src/chapter8/locks.md b/src/chapter8/locks.md new file mode 100644 index 0000000..78052d6 --- /dev/null +++ b/src/chapter8/locks.md @@ -0,0 +1,199 @@ +# Locks + +Ealier, we have learnt about how to write concurrent programs, as well as a few constructs to achieve **synchronisation** in OpenMP. We know that: +- `reduction construct` partitions shared data and uses barrier to achieve synchronisation +- `atomic construct` utilises hardware ability to achieve thread-safe small memory read/write operations. + +What about `critical construct`? We said that it uses locks, but what are locks? + +> Notes that the direct use of locks is **not recommended** (at least in OpenMP): +> - It is very easy to cause deadlock or hard-to-debug livelock (more on these at the end of this sub-chapter). +> - It can often cause very poor performance or worse. +> - It generally indicates that the program design is wrong. +> +> We will explore them because it is important to know about what is happening under the hood of the high-level APIs. + +## Overall Idea + +Lock is a synchronization technique. A lock is an abstraction that allows at most one thread to own it at a time. To be more concrete, let's say we have a segment of code, guarded by a **lock**. Then, exactly 1 thread can execute those lines of code at a time (Sounds familiar?). Any other threads (without the lock) trying to access the code segment will have to wait until the lock is released. + +## OpenMP Locks + +Let's start with an example: + +```c +#include +#include + +int total = 0; +int n = 100; +int nums[100]; +omp_lock_t lock; // uninitialized + +int main() { + + omp_init_lock(&lock); // the lock is initialized but unlocked + + // Populate nums + for (int i = 0; i < n; i++) { + nums[i] = i; + } + +#pragma omp parallel for + for (int i = 0; i < n; i++) { + int temp = nums[i]; + + omp_set_lock(&lock); // a thread changes the lock's state to locked + + total += temp; // something that we want only 1 thread execute at a time + + omp_unset_lock(&lock); // the thread owning the lock changes the lock's state to unlocked + } + omp_destroy_lock(&lock); + printf("%d\n", total); +} +``` + +An OpenMP lock can exist in three states: **uninitialized**, **unlocked**, or **locked**. When in the unlocked state, a task can acquire the lock, transitioning it to the locked state. The task acquiring the lock is considered its owner. An owning task can release the lock, reverting it to the unlocked state. Any attempt by a task to release a lock it does not own renders the program non-conforming. + +There are two types of locks supported: simple locks and nested locks: +- Nested locks allow for multiple acquisitions before unlocking. They remain locked until unset as many times as `omp_set_nest_lock` has been called. Nested locks facilitate scenarios where functions call other functions utilizing the same lock. +- Simple locks should be acquired only once using `omp_set_lock` and released with a single call to `omp_unset_lock`. + +## Deadlocks + +When used correctly and cautiously, locks can effectively prevent race conditions. However, there's another issue to be aware of. Because using locks means that threads have to wait (blocking when another thread holds the lock), there's a risk of a situation where two threads end up waiting for each other, leading to a stalemate where neither can progress. + +Let's look at this code: + +```c +#include +#include + +omp_lock_t lock1, lock2; + +int main() { + omp_init_lock(&lock1); + omp_init_lock(&lock2); + +#pragma omp parallel num_threads(2) + { + int thread_id = omp_get_thread_num(); + + if (thread_id == 0) { + omp_set_lock(&lock1); // Thread 0 takes lock 1 + printf("Thread %d acquired lock1\n", thread_id); + omp_set_lock(&lock2); // Attemp to take lock 2 (but already belongs to thread 1 => wait) + printf("Thread %d acquired lock2\n", thread_id); + omp_unset_lock(&lock2); + omp_unset_lock(&lock1); + } + else { + omp_set_lock(&lock2); // Thread 1 takes lock 2 + printf("Thread %d acquired lock2\n", thread_id); + omp_set_lock(&lock1); // Attemp to take lock 1 (but already belings to thread 0 => wait) + printf("Thread %d acquired lock1\n", thread_id); + omp_unset_lock(&lock1); + omp_unset_lock(&lock2); + } + } + + omp_destroy_lock(&lock1); + omp_destroy_lock(&lock2); + + return 0; +} +``` + +The output should be something like this: + +![Deadlock](./imgs/deadlock.png) + +The program is not terminated. However, no thread is making any progress as they are being blocked by each other at the same time! + +Deadlock is not just limited to two modules; the key characteristic of deadlock is a cycle of dependencies: +- A is waiting for B +- B is waiting for C +- C is waiting for A + +In such a loop, none of the threads can move forward. + +## Livelocks + +A more challenging issue that may arise is livelock. Similar to deadlock, livelocked threads are unable to make progress. However, unlike deadlock, where threads are blocked, livelocked threads remain active. They're caught in a continuous and infinite sequence of responding to each other, preventing them from making any meaningful progress in their work. + +```c +#include +#include +#include + +omp_lock_t lock1, lock2; + +void execute_task(int task_number) { + omp_lock_t* first_lock; + omp_lock_t* second_lock; + const char* lock1_message; + const char* lock2_message; + + if (task_number == 1) { + first_lock = &lock1; + second_lock = &lock2; + lock1_message = "lock1"; + lock2_message = "lock2"; + } + else { + first_lock = &lock2; + second_lock = &lock1; + lock1_message = "lock2"; + lock2_message = "lock1"; + } + + while (1) { + omp_set_lock(first_lock); + printf("%s acquired, trying to acquire %s.\n", lock1_message, lock2_message); + + // sleep for 50 milliseconds to illustrate some meaningful tasks, + // and to ensures that the order of lock and unlock can not correct itself by chance + usleep(50000); + + if (omp_test_lock(second_lock)) { + printf("%s acquired.\n", lock2_message); + } + else { + printf("cannot acquire %s, releasing %s.\n", lock2_message, lock1_message); + omp_unset_lock(first_lock); + continue; + } + + printf("executing task %d.\n", task_number); + break; + } + omp_unset_lock(second_lock); + omp_unset_lock(first_lock); +} + +int main() { + omp_init_lock(&lock1); + omp_init_lock(&lock2); + +// each section will be executed in parallel +#pragma omp parallel sections + { +#pragma omp section + { + execute_task(1); + } + +#pragma omp section + { + execute_task(2); + } + } + + omp_destroy_lock(&lock1); + omp_destroy_lock(&lock2); + + return 0; +} + +``` diff --git a/src/chapter5/openmpi.md b/src/chapter8/message-passing.md similarity index 76% rename from src/chapter5/openmpi.md rename to src/chapter8/message-passing.md index 6e89875..d893137 100644 --- a/src/chapter5/openmpi.md +++ b/src/chapter8/message-passing.md @@ -1,6 +1,16 @@ -# OpenMPI +# Message Passing -## Primary MPI Routines +As each processor has its own local memory with its own address space in distributed computing, we need a way to communicate between the processes and share data. Message passing is the mechanism of exchanging data across processes. Each process can communicate with one or more other processes by sending messages over a network. + +The MPI (message passing interface) in OpenMPI is a communication protocol standard defining message passing between processors in distributed environments and are implemented by different groups with the main goals being high performance, scalability, and portability. + +OpenMPI is one implementation of the MPI standard. It consists of a set of headers library functions that you call from your program. i.e. C, C++, Fortran etc. + +For C, you will need a header file for all the functions `mpi.h` and link in the relevant library functions. This is all handled by the mpicc program (or your compiler if you wanted to specify all the paths). + +## OpenMPI + +### Primary MPI Routines ``` C int MPI_Init(int * argc, char ** argv); @@ -22,9 +32,9 @@ int MPI_Comm_rank(MPI_Comm comm, int \* rank); // rank contains the value for that process- the function return value is an error code ``` -![MPI routines](imgs/mpi_routines.png) +![MPI routines](imgs/mpi-routines.png) -### Point-to-Point communication +#### Point-to-Point communication These are blocking functions - they wait until the message is sent or received. Note that the CPU is actively polling the network interface when waiting for a message. This is opposite in behaviour to other C functions, i.e. c= getChar() (which causes a context switch and then a sleep in the OS). This is done for speed reasons. @@ -82,7 +92,7 @@ OUTPUT PARAMETERS - ```IERROR``` - Fortran only: Error status (integer). -### Primary MPI Routines closing +#### Primary MPI Routines closing In a header file you will find @@ -97,7 +107,7 @@ To call in your C or C++ program MPI_Finalize(); ``` -## General overview MPI program +### General overview MPI program ``` C ... @@ -119,27 +129,27 @@ Use man pages to find out more about each routine When sending a Process it packs up all of its necessary data into a buffer for the receiving process. These buffers are often referred to as envelopes since the data is being packed into a single message before transmission (similar to how letters are packed into envelopes before transmission to the post office) -## Elementary MPI Data types +### Elementary MPI Data types MPI_Send and MPI_Recv utilize MPI Datatypes as a means to specify the structure of a message at a higher level. The data types defined in the table below are simple in nature and for custom data structures you will have to define the structure. -| MPI datatype | C equivalent | -|-------------------------|------------------------| -| MPI_SHORT | short int | -| MPI_INT | int | -| MPI_LONG | long int | -| MPI_LONG_LONG | long long int | -| MPI_UNSIGNED_CHAR | unsigned char | -| MPI_UNSIGNED_SHORT | unsigned short int | -| MPI_UNSIGNED | unsigned int | -| MPI_UNSIGNED_LONG | unsigned long int | -| MPI_UNSIGNED_LONG_LONG | unsigned long long int | -| MPI_FLOAT | float | -| MPI_DOUBLE | double | -| MPI_LONG_DOUBLE | long double | -| MPI_BYTE | char | - -## Example of a simple program +| MPI datatype | C equivalent | +| ---------------------- | ---------------------- | +| MPI_SHORT | short int | +| MPI_INT | int | +| MPI_LONG | long int | +| MPI_LONG_LONG | long long int | +| MPI_UNSIGNED_CHAR | unsigned char | +| MPI_UNSIGNED_SHORT | unsigned short int | +| MPI_UNSIGNED | unsigned int | +| MPI_UNSIGNED_LONG | unsigned long int | +| MPI_UNSIGNED_LONG_LONG | unsigned long long int | +| MPI_FLOAT | float | +| MPI_DOUBLE | double | +| MPI_LONG_DOUBLE | long double | +| MPI_BYTE | char | + +### Example of a simple program ``` C @@ -195,7 +205,7 @@ int main(int argc, char *argv[]) } ``` -## Compilation and Linking +### Compilation and Linking - Make sure you have the following packages installed and that they are in your $PATH: - gcc @@ -210,9 +220,9 @@ int main(int argc, char *argv[]) - mpicc is just a wrapper around a C compiler. To see what it does type: - ```mpicc –showme``` -### sbatch to send job to compute nodes using SLURM +#### sbatch to send job to compute nodes using SLURM -``` bash +```bash #!/bin/bash #SBATCH --job-name=Vaccinator #SBATCH --ntasks=4 @@ -231,7 +241,7 @@ mpirun -np 4 ./my-awesome-program - ntasks-per-node Controls the maximum number of tasks per allocated node - cpus-per-task Controls the number of CPUs allocated per task -## Measuring performance +### Measuring performance - ```htop``` to check the CPU usage. You need to run this command while the process is running - If you are using SLURM, you will need to use ```squeue``` or ```scontrol``` to find the compute node it is running on and then ssh into it. diff --git a/src/chapter8/multithreading.md b/src/chapter8/multithreading.md new file mode 100644 index 0000000..34a8dd9 --- /dev/null +++ b/src/chapter8/multithreading.md @@ -0,0 +1,106 @@ +# Multithreading + +We have all looked at the theory of threads and concurrent programming in the Operating System chapter. Now, we will shift our focus to OpenMP and its application for executing multithreaded operations in a declarative programming style. + +## OpenMP + +OpenMP is an Application Program Interface (API) that is used to explicitly direct multi-threaded, shared memory parallelism in C/C++ programs. It is not intrusive on the original serial code in that the OpenMP instructions are made in pragmas interpreted by the compiler. + +> Further features of OpenMP will be introduced in conjunction with the concepts discussed in later sub-chapters. + +### Fork-Join Parallel Execution Model + +OpenMP uses the `fork-join model` of parallel execution. + +* **FORK**: All OpenMP programs begin with a `single master thread` which executes sequentially until a `parallel region` is encountered, when it creates a team of parallel threads. + +The OpenMP runtime library maintains a pool of threads that can be added to the threads team in parallel regions. When a thread encounters a parallel construct and needs to create a team of more than one thread, the thread will check the pool and grab idle threads from the pool, making them part of the team. + +* **JOIN**: Once the team threads complete the parallel region, they `synchronise` and return to the pool, leaving only the master thread that executes sequentially. + +![Fork - Join Model](./imgs/fork-join.png) + +> We will look a bit more into what is synchronisation as well as synchronisation techniques in the next sub-chapter. + +### Imperative vs Declarative + +Imperative programming specifies and directs the control flow of the program. On the other hand, declarative programming specifies the expected result and core logic without directing the program's control flow. + +OpenMP follows a declarative programming style. Instead of manually creating, managing, synchronizing, and terminating threads, we can achieve the desired outcome by simply declaring it using pragma. + +![Structure Overview](./imgs/program-structure.png) + +### Working with OpenMP + +We will now look at a simple example. + +> The code can be compiled with `gcc -fopenmp -o hello hello.c`. + +```c +#include +#include + +int main() { + int i; + #pragma omp parallel for + for (i = 0; i < 10; i++) { + printf("Thread %d executing iteration %d\n", omp_get_thread_num(), i); + } + return 0; +} +``` + +## Running on M3 + +Here is a template script provided in the home directory in M3. Notice that we can dynamically change the number of threads using `export OMP_NUM_THREADS=12` + +```bash +#!/bin/bash +# Usage: sbatch slurm-openmp-job-script +# Prepared By: Kai Xi, Apr 2015 +# help@massive.org.au + +# NOTE: To activate a SLURM option, remove the whitespace between the '#' and 'SBATCH' + +# To give your job a name, replace "MyJob" with an appropriate name +# SBATCH --job-name=MyJob + + +# To set a project account for credit charging, +# SBATCH --account=pmosp + + +# Request CPU resource for a openmp job, suppose it is a 12-thread job +# SBATCH --ntasks=1 +# SBATCH --ntasks-per-node=1 +# SBATCH --cpus-per-task=12 + +# Memory usage (MB) +# SBATCH --mem-per-cpu=4000 + +# Set your minimum acceptable walltime, format: day-hours:minutes:seconds +# SBATCH --time=0-06:00:00 + + +# To receive an email when job completes or fails +# SBATCH --mail-user= +# SBATCH --mail-type=END +# SBATCH --mail-type=FAIL + + +# Set the file for output (stdout) +# SBATCH --output=MyJob-%j.out + +# Set the file for error log (stderr) +# SBATCH --error=MyJob-%j.err + + +# Use reserved node to run job when a node reservation is made for you already +# SBATCH --reservation=reservation_name + + +# Command to run a openmp job +# Set OMP_NUM_THREADS to the same value as: --cpus-per-task=12 +export OMP_NUM_THREADS=12 +./your_openmp_program +``` diff --git a/src/chapter8/parallelism.md b/src/chapter8/parallelism.md new file mode 100644 index 0000000..77e7d21 --- /dev/null +++ b/src/chapter8/parallelism.md @@ -0,0 +1 @@ +# Types of Parallelism diff --git a/src/chapter8/synchronisation.md b/src/chapter8/synchronisation.md new file mode 100644 index 0000000..132f56f --- /dev/null +++ b/src/chapter8/synchronisation.md @@ -0,0 +1,257 @@ +# Synchronisation + +Definition: Synchronisation is the task of coordinating multiple of processes (or threads) to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. + +## Race Condition + +Let's start with this simple program: + +```c +/* +We purposefully added the following code within the program: +- The sleep() calls allow thread switching in the middle of function calls. +- The silly variable assignments in increment() mimic the register. +- All functions are sharing a global counter variable. + +Note that: +- Even if we remove all of the sleep() and the variable assignments, +the error can still occur by chance. + +What should be the desired output? +What is the actual output? +*/ +#include +#include +#include + +float sleep_time = 0.1; +int counter = 0; // Sharing across the program + +int get_value() { + sleep(sleep_time); // This will cause thread switching + printf("Current Counter = %d\n", counter); + return counter; +} + +void increment() { + int temp = counter; // Load counter to register + sleep(sleep_time); // This will cause thread switching + temp++; // Increment the register + counter = temp; // Store back to the variable + + printf("Incremented counter to %d\n", counter); +} + +int main() { +#pragma omp parallel for + for (int i = 0; i < 5; i++) { + increment(); + get_value(); + } + + return 0; +} +``` + +### Single Thread + +Running the program using 1 thread: +```bash +export OMP_NUM_THREADS=1 +./counter +``` +The output should look something like this: + +![1 thread counter](./imgs/one-thread-counter.png) + +The output coordinates with what we expected. +- This is because we only used 1 single thread. +- The program is just a sequential program without any parallism. +- `sleep()` calls simply put the thread to sleep, that same thread will go to sleep, wake up, and continue the execution. + +### Multiple Threads + +```bash +export OMP_NUM_THREADS=2 +./counter +``` + +Running the program using 2 threads may give us this output (this is just 1 **possible** output): + +![alt text](./imgs/two-threads-counter.png) + +What is happening here? +- We are using 2 threads. +- Both threads are trying to access the global variable `counter` at the same time (roughly). +- During the time when 1 thread is sleeping, the other thread may increment the shared counter. +- The 2 threads simply go on their way and not coordinate with each other. + +> What we are having here is `Race Condition`. A race condition occurs when two or more threads can access `shared data` and they try to `change it at the same time`. + +### How to resolve the problem? + +There are a few ways we can resolve the race condition in OpenMP: + +* **Critical construct**: This restricts the code so that only one thread can do something at a time (in our example, only 1 thread can increment the counter at a time). However, it is `bad for performance` and possibly destroy a lot of the gains from running code in parallel in the first place. + +```c +int main() { +#pragma omp parallel for + for (int i = 0; i < 5; i++) { +#pragma omp critical // Critical construct + increment(); + get_value(); + } + return 0; +} +``` + +* **Atomic construct**: This is quite similar to Critical construct, however, it only applies to memory read/write operations. It has a better performance than the Critical construct by taking advantage on the hardware. There's no lock/unlock needed on entering/exiting the line of code, it just does the atomic operation which the hardware tells you can't be interfered with. Let's look at another example: + +> Run this program multiple times using multiple threads (before uncommenting the construct). Again, race condition! + +```c +#include +#include + +int total = 0; +int n = 100; +int nums[100]; + +int main() { + // Populate nums + for (int i = 0; i < n; i++) { + nums[i] = i; + } + +#pragma omp parallel for + for (int i = 0; i < n; i++) { + int temp = nums[i]; + /* + We can easily resolve the race condition with atomic/critical construct. + The atomic one will work perfectly and give better performance this time. + Uncomment the construct below to resolve the race condition. + */ +// #pragma omp atomic + total += temp; + } + printf("%d\n", total); +} +``` + +* **Reduction**: Based on the problem, sometimes, the best solution will be to use `reduction`. Let's analyse what this code is doing: + +> Using `reduction` here results in significantly better performance. +> - A quick way to do some simple benchmarking is: `time a-command` +> - Conduct benchmarking for 3 versions, and trying in different number of threads + +Example: +```bash +# Tuning the number of threads +export OMP_NUM_THREADS=4 + +# Change according to your file's name +time ./critical +time ./atomic +time ./reduction +``` + +```c +#include +#include + +int total = 0; +int n = 100; +int nums[100]; + +int main() { + // Populate nums + for (int i = 0; i < n; i++) { + nums[i] = i; + } + +#pragma omp parallel for reduction(+:total) num_threads(3) + for (int i = 0; i < n; i++) { + int temp = nums[i]; + total += temp; + } + printf("Final total is: %d\n", total); +} +``` + +> Notice that: +> - The previous two approaches only allow 1 thread at a time to perform some operations. +> - Reduction allows threads to access the same shared data at the same time, but in different parts of the data. +> +> The nature of the word **synchronisation** in these two examples is completely different from each other, while still adhering to our initial definition! + +## Barrier Synchronisation + +In the last sub-chapter, we have talked about the [Fork - Join Model](./multithreading.md#fork-join-parallel-execution-model). We know that **"Once the team threads complete the parallel region, they `synchronise` and return to the pool, leaving only the master thread that executes sequentially."**. However, there are a few important aspects that we have left out: +- The time taken to finish the assigned task is **different** for each thread. +- How can OpenMP know/identify **when** a thread has completed its own task. +- How can OpenMP know/identify **when** all threads have finished all the tasks. + +The answer lies in something called **Barrier Synchronisation**. Here are illustrations for the idea: + +![Barrier Illustration](./imgs/barrier.png) + +![Barrier Wait](./imgs/barrier-wait.png) + +![Barrier End](./imgs/barrier-end.png) + +### Implicit Barriers + +The barrier synchronisation implicitly (behind the scene) occur at the end of constructs such as parallel construct ("`#pragma omp parallel`") and the end of worksharing constructs(loop, sections, single, and workshare constructs). + +```c +#include +#include + +int main(void) +{ + #pragma omp parallel { + // Parallel code + printf("Thread %d is executing...\n", omp_get_thread_num()); + } + + // Sequential code after the barrier + printf("Main thread\n"); + return 0; +} +``` + +### Barrier Construct + +The barrier construct specifies an **explicit** (We add the construct into the code by ourselves) barrier at the point at which the construct appears. The barrier construct is a stand-alone directive. Here is an illustration of the following code. + +![Explicit Barrier](./imgs/explicit-barrier.png) + +```c +#include +#include + +int main(void) +{ + #pragma omp parallel + { + printf("Thread %d executes part 1\n", omp_get_thread_num()); + #pragma omp barrier + + // No thread will execute part 2 before part 1 + printf("Thread %d executes part 2\n", omp_get_thread_num()); + } + return 0; +} +``` + +### Let's think about a way to implement a barrier + +We don't need to know exactly how OpenMP implemented this feature, at least not right now (if you are interested in OpenMP implementation, [here](https://www.openmp.org/spec-html/5.0/openmpse25.html) could be a start). We can follow a rough simple approach: + +- Let's assume we have `n` threads. +- We need a way to count how many threads that have finished, this can easily be done with a shared counter variable (be careful with race condition) among threads. When this counter reaches the number `n`, we will know that all threads have finished. +- We also need a mechanism to make a finished thread idle and **wait()** for other threads to finish. +- The last thread to finish has the responsibility of **notify()** other threads (threads that you want to be executed after the barrier). + +Voila! we have a barrier. We will implement barrier as part of a mini-project using [Posix Thread](https://docs.oracle.com/cd/E26502_01/html/E35303/tlib-1.html).