-
Notifications
You must be signed in to change notification settings - Fork 1
Chapter/parallel computing completed #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
c693365
Added challenges page to distributed computing chapter
oraqlle a75392c
add .gitignore
isoyuki a3d78ff
add contents of parallel-computing
isoyuki 26664a3
address comments in initial PR
isoyuki 582a191
address comments in initial PR
isoyuki 0129491
Add descriptive names to links and use relative paths
isoyuki 485a223
Merge pull request #18 from MonashDeepNeuron/parallel-computing-stash
oraqlle 93d5626
Marked challenges section of parallel computing chapter as 'under con…
oraqlle 90491d0
Fixed up part on `#pragma` for OpenMP.
oraqlle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,3 +33,6 @@ | |
|
|
||
| # mdBook Output | ||
| book | ||
|
|
||
| # Added by Yuki | ||
| .obsidian | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,63 @@ | ||
| # Challenges | ||
|
|
||
| 🚧 Under Construction 🏗️ | ||
|
|
||
| ## Task 1 - Parallise `for` Loop | ||
|
|
||
| Goal: To to create an array `[0,1,2………...19]` | ||
|
|
||
| 1. Git clone [HPC-Training-Challenges](https://github.com/MonashDeepNeuron/HPC-Training-Challenges) | ||
| 2. Go to the directory “challenges/parallel-computing”. Compile array.c and execute it. Check the run time of the serial code | ||
| 3. Add `#pragma<>` | ||
| 4. Compile the code again | ||
| 5. Run parallel code and check the improved run time | ||
|
|
||
| ## Task 2 - Run task 1 on HPC cluster | ||
|
|
||
| 1. Check the available partitions with `show_cluster` | ||
| 2. Modify `RunHello.sh ` | ||
| 3. `sbatch RunHello.sh` | ||
| 4. `cat slurm<>.out` and check the run time | ||
|
|
||
| >You can also use [strudel web](https://beta.desktop.cvl.org.au/login) to run the script without sbatch | ||
|
|
||
| ## Task 3 - Reduction Clause | ||
|
|
||
| Goal: To find the sum of the array elements | ||
|
|
||
| 1. Compile `reduction.c` and execute it. Check the run time | ||
| 2. Add `#pragma<>` | ||
| 3. Compile `reduction.c` again | ||
| 4. Run parallel code and check the improved run time. Make sure you got the same result as the serial code | ||
|
|
||
| >`module load gcc` to use newer version of gcc if you have error with something like `-std=c99` | ||
|
|
||
| ## Task 4 - Private clause | ||
|
|
||
| The goal of this task is to square each value in array and find the sum of them | ||
| 1. Compile private.c and execute it. Check the run time. `#include` the default library `<math.h>` and link it | ||
| 2. Add `#pragma<>` | ||
| 3. Compile `private.c` again | ||
| 4. Run parallel code and check the improved run time | ||
|
|
||
| ## Task 5 - Calculate Pi using "Monte Carlo Algorithm" | ||
|
|
||
| Goal: To estimate the value of pi from simulation | ||
|
|
||
| - No instructions on this task. Use what you have learnt in previous tasks to run a parallel code! | ||
| - You should get a result close to pi(3.1415…….) | ||
|
|
||
| Short explanation of Monte Carlo algorithm: | ||
|
|
||
| [YouTube Video: Monte Carlo Simulation](https://www.youtube.com/watch?v=7ESK5SaP-bc&ab_channel=MarbleScience) | ||
|
|
||
|  | ||
|
|
||
| ## Bonus - Laplace equation to calculate the temperature of a square plane | ||
|
|
||
| - Modify `laplace2d.c` | ||
| - Use Makefile to compile the code | ||
| - Make the program as fast as you can | ||
|
|
||
| Brief Algorithm of Laplace equation: | ||
|  |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,3 @@ | ||
| # Parallel Computing | ||
|
|
||
|
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,65 @@ | ||
| # Multithreading | ||
| # Multithreading on HPC | ||
|
|
||
| ## Thread vs Process | ||
|
|
||
|  | ||
|
|
||
| When computer runs a program, your source code is loaded into RAM and process is started. | ||
| A **process** is a collection of code, memory, data and other resources. | ||
| A process runs in a unique address space. So Two processes can not see each other’s memory. | ||
|
|
||
| A **thread** is a sequence of code that is executed inside the scope of the **process**. You can (usually) have multiple **threads** executing concurrently within the same process. | ||
| **Threads** can view the memory (i.e. variables) of other threads within the same process | ||
|
|
||
| A **multiprocessing** system has more than two processors, whereas **multithreading** is a program execution technique that allows a single process to have multiple code segments. | ||
|
|
||
| ## Architecture of a HPC Cluster (Massive) | ||
|
|
||
|  | ||
|
|
||
| The key in HPC is to write a parallel computing code that utilise multiple nodes at the same time. essentially, more computers faster your application | ||
|
|
||
| ## Using Massive | ||
|
|
||
| ### Find Available Partition | ||
|
|
||
| Command: | ||
| ```bash | ||
| show_cluster | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| Before you run your job, it’s important to check the available resources. | ||
|
|
||
| `show_cluster` is a good command to check the available resources such as CPU and Memory. Make sure to also check the status of the of the node, so that your jobs get started without waiting | ||
|
|
||
| ### Sending Jobs | ||
|
|
||
| Command: | ||
| ```bash | ||
| #SBATCH`--flag=value | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| Here is the example of shell script for running multi-threading job | ||
| `#sbatch` specifies resources and then it runs the executable named hello. | ||
|
|
||
| `#sbatch` tasks specifies how many processes to run | ||
| Cpus per task is pretty self explanatory, it specifies how many cpu cores you need to run a process, this will be the number of threads used in the job | ||
| And make sure to specify which partition you are using | ||
|
|
||
| ### Monitor Jobs | ||
|
|
||
| Command: | ||
| ```bash | ||
| squeue | ||
| # or | ||
| squeue -u <username> | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| After you submitted your job, you can use the command squeue to monitor your job | ||
| you can see the status of your job to check whether it’s pending or running and also how long has it been since the job has started. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,96 @@ | ||
| # OpenMP | ||
| # Parallel Computing with OpenMP | ||
|
|
||
| ## What is OpenMP | ||
|
|
||
| OpenMP, stand for open multi-processing is an API for writing multithreaded applications | ||
|
|
||
| It has a set of compiler directives and library routines for parallel applications, and it greatly simplifies writing multi-threaded code in Fortran, C and C++. | ||
|
|
||
| Just few lines of additional code can make your application parallel | ||
|
|
||
| OpenMP uses shared memory architecture. It assumes all code runs on a single server | ||
|
|
||
| ## Threads | ||
|
|
||
|  | ||
|
|
||
| A thread of execution is the smallest instruction that can be managed independently by an operating system. | ||
|
|
||
| In parallel region, multiple threads are spawned and utilises the cores on CPU | ||
|
|
||
| > Only one thread exists in a serial region | ||
|
|
||
| ## OpenMP Compiler Directives | ||
|
|
||
| Recall compiler directives in C; particularly the `#pragma` directive. These can be used to create custom functionality for a compiler and enable specialized features in-code. OpenMP provides a set of `#pragma` directives that can be used to specify the parallelization of a particular loop or section of code. For example, the `#pragma omp parallel` directive is used to start a parallel region, where multiple threads can execute the code concurrently. The `#pragma omp for` directive is used to parallelize a loop, with each iteration of the loop being executed by a different thread. | ||
|
|
||
| Here's an example of how `#pragma` directives can be used with OpenMP to parallelize a simple loop: | ||
|
|
||
|
|
||
| Use `gcc -fopenmp` to compile your code when you use `#pragma` | ||
|
|
||
| ## Compile OpenMP | ||
|
|
||
| 1. Add `#include <omp.h> if you are using OpenMP function` | ||
| 2. Run `gcc -fopenmp -o hello hello.c` | ||
|
|
||
| ## How it works | ||
|
|
||
|  | ||
| [Source](https://www.researchgate.net/figure/OpenMP-API-The-master-thread-is-indicated-with-T-0-while-inside-the-parallel-region_fig3_329536624 | ||
| ) | ||
|
|
||
| Here is an example of `#pragma` | ||
| - The function starts with serial region | ||
| - At the line `#pragma omp parallel`, a group of threads are spawned to create parallel region inside the bracket | ||
| - At the end of the bracket, the program goes back to serial computing | ||
|
|
||
| ## Running "Hello World" on Multi-threads | ||
|
|
||
| >If you're unsure about the difference between **multi-threading** and **multi-processing**, check the page [here](multithreading.md) | ||
|
|
||
| **Drawing in Serial (Left) vs Parallel (Right)** | ||
|  | ||
|
|
||
| Drawing in serial versus drawing in parallel, you can see how we can place one pixel at a time and take a long time to make the drawing, but on the right hand side if we choose to load and place four pixels down simultaneously we can get the picture faster, however during the execution it can be hard to make out what the final image will be, given we don’t know what pixel will be placed where in each execution step. | ||
|
|
||
| Now this is obviously a fairly abstract analogy compared to exactly what’s happening under the hood, however if we go back to the slide diagram containing zones of multiple threads and serial zones, some parts of a program must be serial as if this program went further and drew a happy face and then a frown face, drawing both at the same time is not useful to the program, yes it would be drawn faster but the final image won’t make sense or achieve the goal of the program. | ||
|
|
||
| ## How many threads? You can dynamically change it | ||
|
|
||
| **`omp_set_num_threads()` Library Function** | ||
| Value is set inside program. Need to recompile program to change | ||
|
|
||
| **`OMP_NUM_THREADS` Environment Variable** | ||
|
|
||
| ```bash | ||
| export OMP_NUM_THREADS=4 | ||
| ./hello | ||
| ``` | ||
|
|
||
| The operating system maps the threads to available hardware. You would not normally want to exceed the number of cores/processors available to you. | ||
|
|
||
| ## Measuring Performance | ||
|
|
||
| The command `top` or `htop` looks into a process. As you can see from the image on right, it shows the CPU usages. | ||
|
|
||
|  | ||
|
|
||
| The command `time` checks the overall performance of the code. | ||
|
|
||
|  | ||
|
|
||
| By running this command, you get real time, user time and system time. | ||
|
|
||
| **Real** is wall clock time - time from start to finish of the call. This includes the time of overhead | ||
|
|
||
| **User** is the amount of CPU time spent outside the kernel within the process | ||
|
|
||
| **Sys** is the amount of CPU time spent in the kernel within the process. | ||
| **User** time + **Sys** time will tell you how much actual CPU time your process used. | ||
|
|
||
| ## More Features of OpenMP | ||
|
|
||
| - [YouTube Video: Introduction to OpenMP](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) | ||
| - [YouTube Video: Data environment -\#pragma omp parallel private](https://www.youtube.com/watch?v=dlrbD0mMMcQ&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=17) | ||
| - [YouTube Video: Parallel Loops - \#omp parallel for reduction()](https://www.youtube.com/watch?v=iPb6OLhDEmM&list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG&index=11 ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,41 @@ | ||
| # What is Parallel Computing? | ||
| # Introduction to Parallel Computing | ||
|
|
||
| ## What is Parallel Computing? | ||
|
|
||
| Parallel computing is about executing the instructions of the program simultaneously | ||
|
|
||
| One of the core values of computing is the breaking down of a big problem into smaller easier to solve problems, or at least smaller problems. | ||
|
|
||
| In some cases, the steps required to solve the problem can be executed simultaneously (in parallel) rather than sequentially (in order) | ||
|
|
||
| A supercomputer is not just about fast processors. It is multiple processors working together in simultaneously. Therefore it makes sense to utilise parallel computing in the HPC environment, given the access to large numbers of processors | ||
|
|
||
|  | ||
|
|
||
| An example of parallel computing looks like this. | ||
|
|
||
|  | ||
|
|
||
| Here there is an array which contains numbers from 0 to 999. The program is to increment each values by 1. Comparing serial code on left and parallel code on right, parallel code is utilising 4 cores of a CPU. Therefore, it can expect approximately 4 times speed up from just using 1 core, what we are seeing here is how the same code can in-fact execute faster as four times as many elements can be updated in the same time one would be. | ||
|
|
||
| ## Parallel Computing Memory Architectures | ||
|
|
||
| Parallel computing has various memory architectures | ||
|
|
||
| ### Shared Memory Architecture: | ||
|
|
||
| There is shared memory architectures where multiple CPUs runs on the same server. OpenMP uses this model | ||
|
|
||
|  | ||
|
|
||
| ### Distributed Memory Architecture: | ||
|
|
||
| This distributed memory architecture where CPU and memory are bundled together and works by communicating with other nodes. Message passing protocol called lMPI is used in this model | ||
|
|
||
|  | ||
|
|
||
| ### Hybrid Parallel Programming: | ||
|
|
||
| For High Performance Computing (HPC) applications, OpenMP is combined with MPI. This is often referred to as Hybrid Parallel Programming. | ||
|
|
||
|  | ||
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # Challenges |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.