-
Notifications
You must be signed in to change notification settings - Fork 591
feat: add support for custom NVIDIA GPU device selection and capabilities #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for custom NVIDIA GPU device selection and capabilities #161
Conversation
Allow users to specify custom GPU devices and driver capabilities for NVIDIA GPU workloads instead of using hardcoded defaults. Changes: - Add optional `pms.gpu.nvidia.devices` field (default: "all") Supports GPU indices, UUIDs, or "all" for all GPUs - Add optional `pms.gpu.nvidia.capabilities` field (default: "compute,video,utility") - Add documentation links to NVIDIA Container Toolkit docs - Use Helm default function for cleaner template syntax Users can now specify specific GPUs (e.g., "0,1" or by UUID) and customize driver capabilities (compute, video, utility, graphics, etc.) according to their workload requirements. References: - GPU enumeration: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#gpu-enumeration - Driver capabilities: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#driver-capabilities
Document new configurable fields for NVIDIA GPU support: - pms.gpu.nvidia.devices: GPU device selection - pms.gpu.nvidia.capabilities: Driver capabilities configuration Include links to official NVIDIA Container Toolkit documentation for GPU enumeration and driver capabilities. Bump chart version.
cilindrox
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @izphi78
Just a small nit regarding the versioning, this should be a minor bump to v1.3.0.
Co-authored-by: Gaston Festari <cilindrox@gmail.com>
Co-authored-by: Gaston Festari <cilindrox@gmail.com>
Co-authored-by: Gaston Festari <cilindrox@gmail.com>
|
Thank you for the quick response ! :) Do I need to sign my commits ? If so, shall I open a new PR or force push the new commits ? |
cilindrox
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @izphi78
no need to sign them - I'll squash these and they'll get signed through GH's gui.
Rename claimSecret.value -> claimSecret.key conditional check in the `statefulset.yaml` template.
Signed-off-by: Gaston Festari <cilindrox@gmail.com>
|
This has shipped on v1.3.0 - thanks @izphi78 |
Allow users to specify custom GPU devices and driver capabilities
for NVIDIA GPU workloads instead of using hardcoded defaults.
Changes:
pms.gpu.nvidia.devicesfield (default: "all")Supports GPU indices, UUIDs, or "all" for all GPUs
pms.gpu.nvidia.capabilitiesfield(default: "compute,video,utility")
Users can now specify specific GPUs (e.g., "0,1" or by UUID) and
customize driver capabilities (compute, video, utility, graphics, etc.)
according to their workload requirements.
References: