Containers running on nodes that support NVIDIA drivers should use the nvidia-container-toolkit to provide proper driver integration. This feature should include
- Admin documentation for
- Installing
nvidia-driver from official Debian repos
- Installing
nvidia-container-toolkit from NVIDIA sources (link to NVIDIA docs)
- Configuring the /usr/share/lxc/hooks/nvida hook script for API use by symlinking to /var/lib/vz/snippets (idk is this the best way?)
- Container creator updates to
- Identify NVIDIA nodes (boolean in Nodes model? autodetected based on hook script presence?)
- Add the
NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=utility compute environment variables + the hook script to containers created on NVIDIA nodes.
- Boolean in Containers model "GPU Required" to enforce being created on a Node with a GPU? (Unnessecary if GPU Nodes are in their own sites, but nessecary if we have mixed sites, would require the boolean in the Nodes model rather than autodetection)
Containers running on nodes that support NVIDIA drivers should use the
nvidia-container-toolkitto provide proper driver integration. This feature should includenvidia-driverfrom official Debian reposnvidia-container-toolkitfrom NVIDIA sources (link to NVIDIA docs)NVIDIA_VISIBLE_DEVICES=allandNVIDIA_DRIVER_CAPABILITIES=utility computeenvironment variables + the hook script to containers created on NVIDIA nodes.