diff --git a/src/chapter4/challenges.md b/src/chapter4/challenges.md index 07c30df..8bced1f 100644 --- a/src/chapter4/challenges.md +++ b/src/chapter4/challenges.md @@ -1,11 +1,11 @@ # Challenges -## Task 1 - Parallise for Loop +## Task 1 - Parallise `for` Loop -Goal: To to create an array [0,1,2………...19] +Goal: To to create an array `[0,1,2………...19]` -1. Git clone https://github.com/Yusuke710/HPC_training2021.git -2. Go to the directory “question”. Compile array.c and execute it. Check the run time of the serial code +1. Git clone [HPC-Training-Challenges](https://github.com/MonashDeepNeuron/HPC-Training-Challenges) +2. Go to the directory “challenges/parallel-computing”. Compile array.c and execute it. Check the run time of the serial code 3. Add `#pragma<>` 4. Compile the code again 5. Run parallel code and check the improved run time @@ -17,8 +17,7 @@ Goal: To to create an array [0,1,2………...19] 3. `sbatch RunHello.sh` 4. `cat slurm<>.out` and check the run time ->[!note] ->You can also use strudel web to run the script without sbatch: https://beta.desktop.cvl.org.au/login +>You can also use [strudel web](https://beta.desktop.cvl.org.au/login) to run the script without sbatch ## Task 3 - Reduction Clause @@ -29,7 +28,6 @@ Goal: To find the sum of the array elements 3. Compile `reduction.c` again 4. Run parallel code and check the improved run time. Make sure you got the same result as the serial code ->[!note] >`module load gcc` to use newer version of gcc if you have error with something like `-std=c99` ## Task 4 - Private clause @@ -49,9 +47,9 @@ Goal: To estimate the value of pi from simulation Short explanation of Monte Carlo algorithm: -[https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience](https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience) +[YouTube Video: Monte Carlo Simulation](https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience) -![](src/chapter4/_attachments/Pasted%20image%2020230326142805.png) +![Monte Carlo](imgs/Monte%20Carlo.png) ## Bonus - Laplace equation to calculate the temperature of a square plane @@ -60,4 +58,4 @@ Short explanation of Monte Carlo algorithm: - Make the program as fast as you can Brief Algorithm of Laplace equation: -![](src/chapter4/_attachments/Pasted%20image%2020230326142826.png) \ No newline at end of file +![](imgs/Pasted%20image%2020230326142826.png) \ No newline at end of file diff --git a/src/chapter4/_attachments/4 Parallel Computing OpenMP.gif b/src/chapter4/imgs/4 Parallel Computing OpenMP.gif similarity index 100% rename from src/chapter4/_attachments/4 Parallel Computing OpenMP.gif rename to src/chapter4/imgs/4 Parallel Computing OpenMP.gif diff --git a/src/chapter4/_attachments/Pasted image 20230325110408.png b/src/chapter4/imgs/Distributed Memory Architecture.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325110408.png rename to src/chapter4/imgs/Distributed Memory Architecture.png diff --git a/src/chapter4/_attachments/Pasted image 20230325110529.png b/src/chapter4/imgs/Hybrid Parallel Programming.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325110529.png rename to src/chapter4/imgs/Hybrid Parallel Programming.png diff --git a/src/chapter4/_attachments/Pasted image 20230326142805.png b/src/chapter4/imgs/Monte Carlo.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326142805.png rename to src/chapter4/imgs/Monte Carlo.png diff --git a/src/chapter4/_attachments/Pasted image 20230325112426.png b/src/chapter4/imgs/OpenMP and Directive.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325112426.png rename to src/chapter4/imgs/OpenMP and Directive.png diff --git a/src/chapter4/_attachments/Pasted image 20230325110040.png b/src/chapter4/imgs/Parallel Computing Example.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325110040.png rename to src/chapter4/imgs/Parallel Computing Example.png diff --git a/src/chapter4/_attachments/Pasted image 20230325113147.png b/src/chapter4/imgs/Pasted image 20230325113147.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325113147.png rename to src/chapter4/imgs/Pasted image 20230325113147.png diff --git a/src/chapter4/_attachments/Pasted image 20230325113254.png b/src/chapter4/imgs/Pasted image 20230325113254.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325113254.png rename to src/chapter4/imgs/Pasted image 20230325113254.png diff --git a/src/chapter4/_attachments/Pasted image 20230325113303.png b/src/chapter4/imgs/Pasted image 20230325113303.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325113303.png rename to src/chapter4/imgs/Pasted image 20230325113303.png diff --git a/src/chapter4/_attachments/Pasted image 20230325113312.png b/src/chapter4/imgs/Pasted image 20230325113312.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325113312.png rename to src/chapter4/imgs/Pasted image 20230325113312.png diff --git a/src/chapter4/_attachments/Pasted image 20230325113329.png b/src/chapter4/imgs/Pasted image 20230325113329.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325113329.png rename to src/chapter4/imgs/Pasted image 20230325113329.png diff --git a/src/chapter4/_attachments/Pasted image 20230326141615.png b/src/chapter4/imgs/Pasted image 20230326141615.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326141615.png rename to src/chapter4/imgs/Pasted image 20230326141615.png diff --git a/src/chapter4/_attachments/Pasted image 20230326142826.png b/src/chapter4/imgs/Pasted image 20230326142826.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326142826.png rename to src/chapter4/imgs/Pasted image 20230326142826.png diff --git a/src/chapter4/_attachments/Pasted image 20230325105945.png b/src/chapter4/imgs/Running Processes in Parallel.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325105945.png rename to src/chapter4/imgs/Running Processes in Parallel.png diff --git a/src/chapter4/_attachments/Pasted image 20230325110257.png b/src/chapter4/imgs/Shared Memory Architecture.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325110257.png rename to src/chapter4/imgs/Shared Memory Architecture.png diff --git a/src/chapter4/_attachments/Pasted image 20230326141219.png b/src/chapter4/imgs/Slurm Architecture.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326141219.png rename to src/chapter4/imgs/Slurm Architecture.png diff --git a/src/chapter4/_attachments/Pasted image 20230325112805.png b/src/chapter4/imgs/Thread vs Processes.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325112805.png rename to src/chapter4/imgs/Thread vs Processes.png diff --git a/src/chapter4/_attachments/Pasted image 20230325111415.png b/src/chapter4/imgs/Threads Visualisation.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325111415.png rename to src/chapter4/imgs/Threads Visualisation.png diff --git a/src/chapter4/_attachments/Pasted image 20230325114751.png b/src/chapter4/imgs/Time Command.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325114751.png rename to src/chapter4/imgs/Time Command.png diff --git a/src/chapter4/_attachments/Pasted image 20230325114732.png b/src/chapter4/imgs/Top Command.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230325114732.png rename to src/chapter4/imgs/Top Command.png diff --git a/src/chapter4/_attachments/Pasted image 20230326141618.png b/src/chapter4/imgs/sbatch Command.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326141618.png rename to src/chapter4/imgs/sbatch Command.png diff --git a/src/chapter4/_attachments/Pasted image 20230326141406.png b/src/chapter4/imgs/show_cluster Command.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326141406.png rename to src/chapter4/imgs/show_cluster Command.png diff --git a/src/chapter4/_attachments/Pasted image 20230326141710.png b/src/chapter4/imgs/squeue Command.png similarity index 100% rename from src/chapter4/_attachments/Pasted image 20230326141710.png rename to src/chapter4/imgs/squeue Command.png diff --git a/src/chapter4/multithreading.md b/src/chapter4/multithreading.md index a77e4dd..25a1dc7 100644 --- a/src/chapter4/multithreading.md +++ b/src/chapter4/multithreading.md @@ -2,7 +2,7 @@ ## Thread vs Process -![](src/chapter4/_attachments/Pasted%20image%2020230325112805.png) +![Thread vs Processes](imgs/Thread%20vs%20Processes.png) When computer runs a program, your source code is loaded into RAM and process is started. A **process** is a collection of code, memory, data and other resources. @@ -15,7 +15,7 @@ A **multiprocessing** system has more than two processors, whereas **multithread ## Architecture of a HPC Cluster (Massive) -![](src/chapter4/_attachments/Pasted%20image%2020230326141219.png) +![Slurm Architecture](imgs/Slurm%20Architecture.png) The key in HPC is to write a parallel computing code that utilise multiple nodes at the same time. essentially, more computers faster your application @@ -23,9 +23,12 @@ The key in HPC is to write a parallel computing code that utilise multiple nodes ### Find Available Partition -command: `show_cluster` +Command: +```bash +show_cluster +``` -![](src/chapter4/_attachments/Pasted%20image%2020230326141406.png) +![show_cluster Command](imgs/show_cluster%20Command.png) Before you run your job, it’s important to check the available resources. @@ -33,9 +36,12 @@ Before you run your job, it’s important to check the available resources. ### Sending Jobs -command: `#SBATCH` +Command: +```bash +#SBATCH`--flag=value +``` -![](src/chapter4/_attachments/Pasted%20image%2020230326141618.png) +![sbatch Command](imgs/sbatch%20Command.png) Here is the example of shell script for running multi-threading job `#sbatch` specifies resources and then it runs the executable named hello. @@ -46,9 +52,14 @@ And make sure to specify which partition you are using ### Monitor Jobs -command: `squeue` or `squeue -u ` +Command: +```bash +squeue +# or +squeue -u +``` -![](src/chapter4/_attachments/Pasted%20image%2020230326141710.png) +![squeue Command](imgs/squeue%20Command.png) After you submitted your job, you can use the command squeue to monitor your job you can see the status of your job to check whether it’s pending or running and also how long has it been since the job has started. \ No newline at end of file diff --git a/src/chapter4/openmp.md b/src/chapter4/openmp.md index 26216db..296dc39 100644 --- a/src/chapter4/openmp.md +++ b/src/chapter4/openmp.md @@ -12,13 +12,12 @@ OpenMP uses shared memory architecture. It assumes all code runs on a single ser ## Threads -![](src/chapter4/_attachments/Pasted%20image%2020230325111415.png) +![Threads Visualisation](imgs/Threads%20Visualisation.png) A thread of execution is the smallest instruction that can be managed independently by an operating system. In parallel region, multiple threads are spawned and utilises the cores on CPU -> [!note] > Only one thread exists in a serial region ## Compiler Directive \# pragma @@ -28,8 +27,12 @@ In parallel region, multiple threads are spawned and utilises the cores on CPU - `#include ` - `#pragma omp parallel` -Use `gcc -fopenmp` to compile your code when you use `#pragma` +OpenMP provides a set of `#pragma` directives that can be used to specify the parallelization of a particular loop or section of code. For example, the `#pragma omp parallel` directive is used to start a parallel region, where multiple threads can execute the code concurrently. The `#pragma omp for` directive is used to parallelize a loop, with each iteration of the loop being executed by a different thread. +Here's an example of how `#pragma` directives can be used with OpenMP to parallelize a simple loop: + + +Use `gcc -fopenmp` to compile your code when you use `#pragma` ## Compile OpenMP @@ -38,7 +41,7 @@ Use `gcc -fopenmp` to compile your code when you use `#pragma` ## How it works -![](src/chapter4/_attachments/Pasted%20image%2020230325112426.png) +![OpenMP and Directive](imgs/OpenMP%20and%20Directive.png) [Source](https://www.researchgate.net/figure/OpenMP-API-The-master-thread-is-indicated-with-T-0-while-inside-the-parallel-region_fig3_329536624 ) @@ -49,11 +52,10 @@ Here is an example of `#pragma` ## Running "Hello World" on Multi-threads ->[!info] ->If you're unsure about the difference between **multi-threading** and **multi-processing**, check the page [here](src/chapter4/multithreading.md) +>If you're unsure about the difference between **multi-threading** and **multi-processing**, check the page [here](multithreading.md) **Drawing in Serial (Left) vs Parallel (Right)** -![](src/chapter4/_attachments/4%20Parallel%20Computing%20OpenMP.gif) +![](imgs/4%20Parallel%20Computing%20OpenMP.gif) Drawing in serial versus drawing in parallel, you can see how we can place one pixel at a time and take a long time to make the drawing, but on the right hand side if we choose to load and place four pixels down simultaneously we can get the picture faster, however during the execution it can be hard to make out what the final image will be, given we don’t know what pixel will be placed where in each execution step. @@ -77,11 +79,11 @@ The operating system maps the threads to available hardware. You would not norma The command `top` or `htop` looks into a process. As you can see from the image on right, it shows the CPU usages. -![](src/chapter4/_attachments/Pasted%20image%2020230325114732.png) +![Top Command](imgs/Top%20Command.png) The command `time` checks the overall performance of the code. -![](src/chapter4/_attachments/Pasted%20image%2020230325114751.png) +![Time Command](imgs/Time%20Command.png) By running this command, you get real time, user time and system time. @@ -94,6 +96,6 @@ By running this command, you get real time, user time and system time. ## More Features of OpenMP -- [Introduction to OpenMP](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) -- [\#pragma omp parallel private](https://www.youtube.com/watch?v=dlrbD0mMMcQ&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=17) -- [\#omp parallel for reduction()](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) \ No newline at end of file +- [YouTube Video: Introduction to OpenMP](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) +- [YouTube Video: Data environment -\#pragma omp parallel private](https://www.youtube.com/watch?v=dlrbD0mMMcQ&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=17) +- [YouTube Video: Parallel Loops - \#omp parallel for reduction()](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) \ No newline at end of file diff --git a/src/chapter4/parallel-computing.md b/src/chapter4/parallel-computing.md index 19c1bb3..92836b9 100644 --- a/src/chapter4/parallel-computing.md +++ b/src/chapter4/parallel-computing.md @@ -1,4 +1,4 @@ -# Parallel Computing +# Introduction to Parallel Computing ## What is Parallel Computing? @@ -6,15 +6,15 @@ Parallel computing is about executing the instructions of the program simultaneo One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. -In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than serially (in order) +In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order) A supercomputer is not just about fast processors. It is multiple processors working together in simultaneously. Therefore it makes sense to utilise parallel computing in the HPC environment, given the access to large numbers of processors -![](src/chapter4/_attachments/Pasted%20image%2020230325105945.png) +![Running Processes in Parallel](imgs/Running%20Processes%20in%20Parallel.png) An example of parallel computing looks like this. -![](src/chapter4/_attachments/Pasted%20image%2020230325110040.png) +![Parallel Computing Example](imgs/Parallel%20Computing%20Example.png) Here there is an array which contains numbers from 0 to 999. The program is to increment each values by 1. Comparing serial code on left and parallel code on right, parallel code is utilising 4 cores of a CPU. Therefore, it can expect approximately 4 times speed up from just using 1 core, what we are seeing here is how the same code can in-fact execute faster as four times as many elements can be updated in the same time one would be. @@ -22,17 +22,20 @@ Here there is an array which contains numbers from 0 to 999. The program is to i Parallel computing has various memory architectures -**Shared Memory Architecture:** -![](src/chapter4/_attachments/Pasted%20image%2020230325110257.png) +### Shared Memory Architecture: There is shared memory architectures where multiple CPUs runs on the same server. OpenMP uses this model -**Distributed Memory Architecture:** -![](src/chapter4/_attachments/Pasted%20image%2020230325110408.png) +![Shared Memory Architecture](imgs/Shared%20Memory%20Architecture.png) + +### Distributed Memory Architecture: This distributed memory architecture where CPU and memory are bundled together and works by communicating with other nodes. Message passing protocol called lMPI is used in this model -**Hybrid Parallel Programming:** -![](src/chapter4/_attachments/Pasted%20image%2020230325110529.png) +![Distributed Memory Architecture](imgs/Distributed%20Memory%20Architecture.png) + +### Hybrid Parallel Programming: + +For High Performance Computing (HPC) applications, OpenMP is combined with MPI. This is often referred to as Hybrid Parallel Programming. -For High Performance Computing (HPC) applications, OpenMP is combined with MPI. This is often referred to as Hybrid Parallel Programming. \ No newline at end of file +![Hybrid Parallel Programming](imgs/Hybrid%20Parallel%20Programming.png) \ No newline at end of file