Skip to content
Merged

Dev #87

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
cdbbfab
Update printing.md
Kevin-Duignan Sep 20, 2023
1e109f8
Initial Apache Spark skeleton
Jan 29, 2024
f3c8d02
Added Apache Spark set-up tutorials; Added challenges for the chapter 6
Feb 1, 2024
b9a2436
Finished internal architecture subchapter
Feb 19, 2024
f489fc9
Added examples code for internals subchapter
Feb 19, 2024
6ee46bc
Finished Challenges subchapter; Added shared variables contents; Adde…
Feb 20, 2024
a1d2a1c
Finished subchapter: Job Batching
Feb 20, 2024
264c494
Finished chapter 6: Draft 1
Feb 20, 2024
f0e0cb4
feat: Full revamp outline draft
linton2000 Mar 12, 2024
1350696
feat: Add spark & ML subchapters
linton2000 Mar 12, 2024
a0b0c1b
Merge branch 'feature/spark' into feature/s1-24-revamp
VincentNguyenDuc Mar 14, 2024
48e7207
Updated spark chapter to chapter 10
VincentNguyenDuc Mar 14, 2024
110b8c2
Added skeleton for chapter 8
VincentNguyenDuc Mar 14, 2024
3c79552
Merge pull request #80 from MonashDeepNeuron/Kevin-Duignan-patch-1
oraqlle Mar 18, 2024
3e2285a
Added OpenMP contents
VincentNguyenDuc Mar 19, 2024
29e8d4a
Race Condition
VincentNguyenDuc Mar 20, 2024
471b2dc
Barrier Synchronisation
VincentNguyenDuc Mar 21, 2024
8ffff98
Separate parallel computing into parallel computing and distributed c…
VincentNguyenDuc Mar 24, 2024
d930525
Updated skeleton for distributed computing
VincentNguyenDuc Mar 24, 2024
0ee8376
Finished locks
VincentNguyenDuc Mar 26, 2024
19627ce
Added challenges
VincentNguyenDuc Mar 26, 2024
a8e4033
Remove map-reduce from chapter 7 and chapter 8
VincentNguyenDuc Mar 26, 2024
4d970fc
feat: Chapter 11 skeleton md files
linton2000 Apr 2, 2024
6689cf1
feat: Add intro and move M3 setup to 1.6
linton2000 Apr 2, 2024
428b9ed
feat: Add nectar setup to 1.7
linton2000 Apr 2, 2024
03a2f5d
feat: Clean up chapter 5 and implement 5.1 - batch vs. cloud
linton2000 Apr 3, 2024
f5df7a6
chore: Merge img folders in chapter 5
linton2000 Apr 3, 2024
e59d3e5
Merge & incorporate Chapter 5 docs into new outline
linton2000 Apr 3, 2024
a9ab0bc
Added message-passing and openmpi
VincentNguyenDuc Apr 3, 2024
febeeb8
feat: Parallel & Distributed computing chapter 5.2
linton2000 Apr 7, 2024
1d1a6fd
Chapter 5.2 update
linton2000 Apr 7, 2024
0a3e22a
Merge pull request #84 from MonashDeepNeuron/feature/linton-content
linton2000 Apr 7, 2024
1ea844a
Merge commit
linton2000 Apr 7, 2024
74a1ae4
Merge pull request #85 from MonashDeepNeuron/feature/chapter-8-parallel
linton2000 Apr 7, 2024
f794f8b
Finish chapter 5.2 parallel & distributed computing intro
linton2000 Apr 7, 2024
5fc2a01
Update Chapter 5 outline
linton2000 Apr 7, 2024
9769d60
added files for chapter 7
Joshua-Riantoputra Apr 7, 2024
81eecc7
Finishg Chapter 5.3
linton2000 Apr 7, 2024
5b734ac
Finish chapter 5.4 & modify subchapters
linton2000 Apr 13, 2024
41cc492
Finish chapters 5.5 & 5.6
linton2000 Apr 13, 2024
5a99339
machine learning and hpc
Joshua-Riantoputra Apr 13, 2024
789b527
Restructure chapters 2 & 4
linton2000 Apr 13, 2024
f578eb5
Split chapter 4 challenges from chapter 2
linton2000 Apr 13, 2024
913025b
optimisation algs
Joshua-Riantoputra Apr 13, 2024
650b76f
Complete Initial S1-24 Revamp
linton2000 Apr 13, 2024
0a61247
Merge pull request #86 from MonashDeepNeuron/feature/s1-24-revamp
linton2000 Apr 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ $ mdbook serve --open
- Jaspar Martin
- Yuki Kume
- Osman Haji
- Duc Thanh Vinh Nguyen
- Linton Charles

## Code of Conduct, License & Contributing

Expand Down
1 change: 1 addition & 0 deletions src/.chapter7/challenges.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Challenges
73 changes: 48 additions & 25 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@

[Welcome](home.md)

- [Getting Started](./chapter1/getting-started.md)
- [Installation & Set-up](./chapter1/getting-started.md)
- [GitHub](./chapter1/github.md)
- [Windows](./chapter1/windows.md)
- [Mac](./chapter1/mac.md)
- [Linux](./chapter1/linux.md)
- [WSL](./chapter1/wsl.md)
- [M3 MASSIVE](./chapter1/m3.md)
- [Nectar Cloud](./chapter1/nectar.md)
- [Challenges](./chapter1/challenges.md)

- [Brief Introduction to C](./chapter2/intro-to-c.md)
- [Intro to C](./chapter2/intro-to-c.md)
- [Hello World](./chapter2/helloworld.md)
- [Compilation](./chapter2/compilation.md)
- [Types & Variables](./chapter2/vars.md)
Expand All @@ -20,34 +22,55 @@
- [Control Flow](./chapter2/ctrl-flow.md)
- [Loops](./chapter2/loops.md)
- [Functions](./chapter2/functions.md)
- [Pointers](./chapter2/pointers.md)
- [Dynamic Memory](./chapter2/memory.md)
- [Structures](./chapter2/structs.md)
- [Macros & The Preprocessor](./chapter2/macros.md)
- [Challenges](./chapter2/challenges.md)

- [M3](./chapter3/chapter3.md)
- [Getting Started](./chapter3/start.md)
- [Logging In](./chapter3/login.md)
- [Linux Commands](./chapter3/linux-cmds.md)
- [M3's Shared Filesystem](./chapter3/shared-fs.md)
- [Software and Tooling](./chapter3/software-tooling.md)
- [Bash Scripts](./chapter3/bash.md)
- [Job batching & SLURM](./chapter3/slurm.md)
- [Strudel](./chapter3/strudel.md)
- [Operating Systems](./chapter3/chapter3.md)
- [Computer Architecture](./chapter3/computer-architecture.md)
- [Pointers & Memory](./chapter3/memory-pointers.md)
- [Intro to Linux](./chapter3/linux-intro.md)
- [Threading & Concurrency](./chapter3/threads-concurrency.md)
- [Processes](./chapter3/processes.md)
- [Scheduling Algorithms](./chapter3/scheduling.md)
- [Challenges](./chapter3/challenges.md)

- [Parallel Computing](./chapter4/chapter4.md)
- [What is Parallel Computing?](./chapter4/parallel-computing.md)
- [Multithreading](./chapter4/multithreading.md)
- [OpenMP](./chapter4/openmp.md)
- [More C](./chapter4/chapter4.md)
- [Dynamic Memory](./chapter4/memory.md)
- [Structures](./chapter4/structs.md)
- [Macros & The Preprocessor](./chapter4/macros.md)
- [System Calls](./chapter4/syscalls.md)
- [Spawning Processes & Threads](./chapter4/spawn-procs.md)
- [Challenges](./chapter4/challenges.md)

- [Distributed Computing](./chapter5/chapter5.md)
- [Refresher on Parallelism](./chapter5/parallel-refresher.md)
- [What is Distributed Computing](./chapter5/distributed-computing.md)
- [Message Passing](./chapter5/message-passing.md)
- [OpenMPI](./chapter5/openmpi.md)
- [M3 & SLURM](./chapter5/chapter5.md)

- [Batch Processing vs. Cloud Computing](./chapter5/batch-cloud.md)
- [Parallel & Distributed Computing](./chapter5/parallel-distributed.md)
- [M3 Login - SSH & Strudel](./chapter5/login.md)
- [Intro to SLURM](./chapter5/slurm_intro.md)
- [M3 Interface & Usage](./chapter5/m3-interface.md)
- [Software & Tooling](./chapter5/software-tooling.md)
- [Challenges](./chapter5/challenges.md)

[Acknowledgements](./acknowledgements.md)
- [Introduction to Parallel Computing](./chapter6/chapter6.md)
- [Multithreading](./chapter6/multithreading.md)
- [Synchronisation](./chapter6/synchronisation.md)
- [Locks](./chapter6/locks.md)
- [Message Passing](./chapter6/message-passing.md)
- [Challenges](./chapter6/challenges.md)

- [Parallellisation of Algorithms](./chapter7/chapter7.md)
- [Parallel Search](./chapter7/parallel-search.md)
- [Parallel Sort](./chapter7/parallel-sort.md)
- [Other Parallel Algorithms](./chapter7/other-parallel-algos.md)
- [Machine Learning & HPC](./chapter7/machine-learning-and-hpc.md)
- [Optimisation Algorithms](./chapter7/optim-algos.md)
- [Challenges](./chapter7/challenges.md)

- [Apache Spark](./chapter8/chapter8.md)
- [Installation & Cluster Set-up](./chapter8/set-up.md)
- [Internal Architecture](./chapter8/internals.md)
- [Data Processing](./chapter8/data-processing.md)
- [Job Batching](./chapter8/job-batching.md)
- [Challenges](./chapter8/challenges.md)

[Acknowledgements](./acknowledgements.md)
2 changes: 2 additions & 0 deletions src/acknowledgements.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ This book is part of Monash DeepNeurons collection of technical information and
- [Osman Haji](https://github.com/Ozzywap)
- [Yuki Kume](https://github.com/UnciaBit)
- [Jaspar Martin](https://github.com/jasparm)
- [Duc Thanh Vinh Nguyen](https://github.com/VincentNguyenDuc)
- [Linton Charles](https://github.com/linton2000)

## Contributors

Expand Down
Binary file added src/chapter1/aaf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/chapter1/hpcid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/chapter1/join_project.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 13 additions & 6 deletions src/chapter3/start.md → src/chapter1/m3.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,29 @@
# Getting Started
# M3 MASSIVE

MASSIVE (Multi-modal Australian ScienceS Imaging and Visualisation Environment) is a HPC supercomputing cluster that you will have access to as an MDN member. In this page we will set you up with access before you learn how to use it in Chapter 5. Feel free to go through the docs to learn about the [hardware config](https://docs.massive.org.au/M3/m3users.html) of M3 (3rd version of MASSIVE) and it's [institutional governance](https://massive.org.au/about/about.html#governance).

## Request an account

In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this:
In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this:


![HPC ID](./imgs/aaf.png)
![HPC ID](./aaf.png)

Type in Monash, as you can see here. Select Monash University, and tick the Remember my organisation box down the bottom. Once you continue to your organisation, it will take you to the Monash Uni SSO login page. You will need to login with your Monash credentials.

You should now see something like this: ![HPC ID System](./imgs/hpcid.png)
You should now see something like this:

![HPC ID System](./hpcid.png)

Once you are here, there are a couple things you will need to do. The first, and most important is to set your HPC password. This is the password you will use to login to M3. To do this, go to home, then click on Change Linux Password. This will take you through the steps of setting your password.

Once you have done this, you can move on to requesting access to the MDN project and getting access to gurobi.

## Add to project

To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this: ![Join Project](./imgs/join_project.png)
To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this:

![Join Project](./join_project.png)

In the text box type `vf38` and click search. This is the project code for MDN. Then select the project and click submit. You will now have to wait for the project admins to approve your request. Once they have done this, you will be able to access the project. This should not take longer than a few days, and you will get an email telling you when you have access.

Expand Down Expand Up @@ -47,4 +54,4 @@ cat ~/.ssh/id_ed25519.pub

Then, go to your github account, go to settings, and click on the SSH and GPG keys tab. Click on New SSH key, and paste the key into the box. Give it a name, and click Add SSH key.

You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning.
You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular git cloning.
Binary file added src/chapter1/nectar-login.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions src/chapter1/nectar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Nectar Cloud

The ARDC Nectar Research Cloud (Nectar) is Australia’s national research cloud, specifically designed for research computing. Like with M3, we will set you up with access now before you learn about it in later chapters. [This webpage](https://ardc.edu.au/services/ardc-nectar-research-cloud/) explains what it is if you're curious.

## Connect Monash Account to Nectar Cloud
To create an [identity](https://medium.com/@ciente/identity-and-access-management-iam-in-cloud-computing-2777481525a4) (account) in Nectar Cloud, all you have to do is login using your Monash student account. Click [this link](https://dashboard.rc.nectar.org.au) to access Nectar's landing page.

You will see the following. Make sure to click "Login via AAF (Australia)".

![nectar](./nectar-login.png)

You will be redirected to enter your Monash credentials after which you will see the Nectar Cloud dashboard for your trial project (your project name will be pt-xxxxx).

## Cloud Starter Series

ARDC has provided [this cloud starter tutorial series](https://tutorials.rc.nectar.org.au/cloud-starter/01-overview) for people new to Nectar Cloud. You should be able to follow these tutorials using your trial project. If you need more SUs (service units aka. cloud credits) in order to provision more cloud resources for MDN-related work, you should message your HPC Lead with that request.
12 changes: 0 additions & 12 deletions src/chapter2/challenges.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ The challenges for this chapter can found in the [HPC Training Challenges](https
- [Challenge 4 - GCD \& LCM](#challenge-4---gcd--lcm)
- [Challenge 5 - Bitwise Add](#challenge-5---bitwise-add)
- [Challenge 6 - Bitwise Multiply](#challenge-6---bitwise-multiply)
- [Challenge 7 - Sum and Product Algorithms](#challenge-7---sum-and-product-algorithms)
- [Challenge 8 - Array Concatenation](#challenge-8---array-concatenation)

## Challenge 1 - Hello World

Expand All @@ -44,13 +42,3 @@ For this challenge you have to implement a function called `bitwise_add()` which
This challenge is similar to the last but instead of implementing `+` you must implement `*` (product). Your implementation should be contained in a function called `bitwise_multiply()`. You can use any bitwise or conditional operators.

> Note: If you need `+` you can reimplement it internally in `bitwise_multiply` based on your solution from the previous challenge, import it to a header in this challenges folder and include it or copy it to this folder. Ask a trainer if you get stuck with this.

## Challenge 7 - Sum and Product Algorithms

This challenge involves implementing the sum and product reductions on an array or memory block of integers. As a bonus challenge, try and make the algorithms more generic and work with any binary operator.

## Challenge 8 - Array Concatenation

In this challenge you have to implement an array concatenation function. This should join two arrays of the same type into a single array, similar to `strcat()`. You will need to allocate a new block of memory and in order to store the concatenated arrays which will requires the sizes of the two input arrays to be known by the function. This function should return a pointer to the resulting array.

> Note: The type of the array this function concatenates can be any type except `char`.
15 changes: 14 additions & 1 deletion src/chapter2/printing.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,21 @@ int main()

> Question: Notice how we used `double` for the type of `sum`. What would happen if `sum` type was `int`?

If you want to have a play with `printf()`, copy the following code snippet run it on your own device. The command will be identically to 'Hello World!'.
If you want to have a play with `printf()`, copy the following code snippet run it on your own device. The command line will output different varieties of 'Hello World!'.

```c
#include <stdio.h>

int main() {
printf("%30s\n", "Hello World!"); // Padding added
printf("%40s%10s%20s%15s\n", "Hell", "o", "World", "!");
printf("%10.7s\n", "Hello World!"); // Print only the first 7 characters with padding
printf("%100c%c%c%c%c %c%c%c%c%c%c%c\n",
72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33, '\n'); // Hex values
return 0;
}

```
### Formatting Specification

You'll notice we used a different character after the `%` for each argument. This is because `printf()` needs to know the type of the incoming arguments so that it can format the string appropriately. For example floating point types have to use a decimal point when transformed into a text format while integers do not.
Expand Down
1 change: 0 additions & 1 deletion src/chapter2/stdlib.md

This file was deleted.

42 changes: 0 additions & 42 deletions src/chapter3/bash.md

This file was deleted.

46 changes: 2 additions & 44 deletions src/chapter3/challenges.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,3 @@
# M3 Challenges
# Challenges

## Challenge 1

Navigate to your scratch directory and, using vim (or your chosen in-terminal editor) create a file called `hello.txt` that contains the text "Hello World". Once you have created the file, use the `cat` command to print the contents of the file to the screen.

## Challenge 2

Write a bash script that prints the contents of the above hello.txt file to the screen and run it locally (on your login node).

## Challenge 3

Submit the above script to the queue by writing another SLURM bash script. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from.

## Challenge 4

Request an interactive node and attach to it. Once you have done this, install python 3.7 using conda.

## Challenge 5

Clone and run [this](./dl_on_m3/alexnet_stl10.py) script. You will need to first install the dependencies for it. You don't need to wait for it to finish, just make sure it is working. You will know its working if it starts listing out the loss and accuracy for each epoch. You can stop it by pressing `ctrl + c`.

Once you have confirmed that it is working, deactivate and delete the conda environment, and then end the interactive session.

> Hint: I have included the dependencies and their versions (make sure you install the right version) in the `requirements.txt` file. You will need python 3.7 to run this script.

## Challenge 6

Go back to the login node. Now you are going to put it all together. Write a bash script that does the following:

- (1) requests a compute node
- (2) installs python using conda
- (3) clones and runs the above script

Let this run fully. Check the output of the script to make sure it ran correctly. Does it match the output of the script you ran in challenge 5?
> Hint: You can check the output of the script at any time by `cat`ing the output file. The script does not need to have finished running for you to do this.

## Challenge 7

Edit your submission script so that you get a gpu node, and run the script using the gpu.
> Hint: Use the m3h partition

## Challenge 8

Now you want to clean up your working directory. First, push your solutions to your challenges repo. Then, delete the challenges directory, as well as the conda environment you created in challenge 6.
![under-const](../imgs/under-const.gif)
10 changes: 6 additions & 4 deletions src/chapter3/chapter3.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# M3
# Operating Systems

[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides as majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores.
A decent chunk of HPC involves using low-level tools and techniques to find optimisations and make software run faster. The main reason we use C is that it gives us access to deeper parts of the computer that are normally hidden away and managed on your behalf by your Python or Java interpreter.

M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on.
![comp-levels](./imgs/programming-levels.jpg)

This book will take you through the basics of connecting to M3, submitting jobs, transferring data to and from the system and some other things. If you want to learn more about M3, you can read the [M3 documentation](https://docs.massive.org.au/M3/index.html). This will give you a more in-depth look at the system, and how to use it.
> **Note:** Not all low-level, machine (Assembly) code is faster than high-level code. The primary reason that lower level coding tends to be faster is that it avoids a lot of the overhead (eg. garbage collection) involved in executing higher level code.

If you have done FIT2100 Operating Systems, this chapter would mostly be a refresher for you. It's intended to provide you with a crash course intro to operating systems theory so that you are capable of using low-level tools and implementing things like cache optimisations.
Loading