-
Notifications
You must be signed in to change notification settings - Fork 28
Description
We added some g4ad AWS instances to our development cluster so we could experiment more with AMD GPUs. Just wanted to document some of the steps that had to be taken here.
Installing the drivers
We started from a Rocky 9.6 image. The installation steps are reasonably straightforward but building the kernel modules is quite slow:
sudo yum update
# See https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html#rocm-installation
sudo dnf install https://repo.radeon.com/amdgpu-install/6.4.1/rhel/9.6/amdgpu-install-6.4.60401-1.el9.noarch.rpm
# Install the drivers
sudo amdgpu-install --usecase=dkms # this takes ages
# Needs a reboot after installing the kernel module if the kernel is updated during the yum update
# sudo reboot
# Test that we can see the GPU
sudo yum install amd-smi-lib libdrm libdrm-amdgpu
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/lib:/opt/rocm/lib64
/opt/rocm/bin/amd-smi list
Until there is support in Magic Castle for this part, you would need to create a custom image using this approach (via prepare4image.sh workflow).
Making Slurm aware of the GPU on the node
I did this in a very hack-ish way (acting based only on the node name), but basically you need to add the node capabilities to the gres.conf of the management node and on the compute node itself. The template gres.conf.epp was
###########################################################
# Slurm's Generic Resource (GRES) configuration file
###########################################################
AutoDetect=off
<% $nodes.each |$name, $attr| { -%>
<% if $name =~ /rocm/ { -%>
NodeName=<%= $name %> Name=gpu Type=rocm Count=1 File=/dev/kfd
<% } elsif $attr['specs']['gpus'] > 0 { -%>
<% if $attr['specs']['mig'] and ! $attr['specs']['mig'].empty { -%>
<% $attr['specs']['mig'].map |$key, $value| { -%>
NodeName=<%= $name %> Name=gpu Type=<%= $key %> Count=<%= $value * $attr['specs']['gpus'] %> File=<%= join(range(0, $value * $attr['specs']['gpus'] - 1).map |$i| { "/dev/nvidia-mig-${key}-${i}" }, ',') %>
<% } -%>
<% } else { -%>
NodeName=<%= $name %> Name=gpu Count=<%= $attr['specs']['gpus'] %> File=<%= join(range(0, $attr['specs']['gpus'] - 1).map |$i| { "/dev/nvidia${i}" }, ',') %>
<% } -%>
<% } -%>
<% } -%>
This automatically fixes gres.conf on the management node, but to get a gres.conf for the compute nodes I needed to tweak slurm.pp to include
if $facts['networking']['hostname'] =~ /rocm/ {
file { '/etc/slurm/gres.conf':
ensure => 'present',
owner => 'slurm',
group => 'slurm',
content => epp('profile/slurm/gres.conf', {
'nodes' => {
$facts['networking']['hostname'] => {},
},
}),
seltype => 'etc_t',
}
}
This is enough to allow Slurm to register the node, and then in an interactive job you can see
[ocaisa@x86-64-rocm-zen2-node1 ~]$ scontrol show node x86-64-rocm-zen2-node1
NodeName=x86-64-rocm-zen2-node1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=8 CPUEfctv=8 CPUTot=8 CPULoad=0.07
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:1
NodeAddr=10.0.0.12 NodeHostName=x86-64-rocm-zen2-node1 Version=24.05.8
OS=Linux 5.14.0-570.25.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 7 18:09:10 UTC 2025
RealMemory=32768 AllocMem=28672 FreeMem=28818 Sockets=8 Boards=1
MemSpecLimit=512
State=ALLOCATED+CLOUD ThreadsPerCore=1 TmpDisk=0 Weight=5 Owner=N/A MCS_label=N/A
Partitions=cpubase_bycore_b1,x86-64-rocm-zen2-node
BootTime=2025-07-18T14:49:28 SlurmdStartTime=2025-07-18T14:52:17
LastBusyTime=2025-07-18T14:52:17 ResumeAfterTime=None
CfgTRES=cpu=8,mem=32G,billing=8,gres/gpu=1
AllocTRES=cpu=8,mem=28G
CurrentWatts=0 AveWatts=0
[ocaisa@x86-64-rocm-zen2-node1 ~]$ /opt/rocm/bin/amd-smi list
WARNING: User is missing the following required groups: render, video. Please add user to these groups.
GPU: 0
BDF: 0000:00:1e.0
UUID: 73ff7362-0000-1000-802c-73466b6c6923
KFD_ID: 21974
NODE_ID: 1
PARTITION_ID: 0
As you can see it complains about not being a member of the groups that are allowed to use the GPU. One way to circumvent this is https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/prerequisites.html#grant-gpu-access-to-all-users-on-the-system
The general structure for the devices is
/dev/kfd
/dev/dri/card0
/dev/dri/renderD128
...
/dev/dri/cardN
/dev/dri/renderD<128+N>
/dev/by-path/pci-<bus>:00.0-card0
/dev/by-path/pci-<bus>:00.0-render0
...
/dev/by-path/pci-<bus>:00.0-cardN
/dev/by-path/pci-<bus>:00.0-renderN
where, to my understanding, if you are computing on the GPU N you only need access to /dev/dri/renderD<128+N>. Above I used /dev/kfd which gives access to all GPUs, and since I only have one it makes no difference. If you had multiple GPUs you probably want a gres.conf like
Name=gpu Type=amd File=/dev/dri/renderD128
Name=gpu Type=amd File=/dev/dri/renderD129
so you can schedule individual GPUs