Our quota documentation
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html#managed-by-quota
Is missing this section:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources
But this quota on extended resources like GPUs works fine on OCP 3.11:
# oc version
oc v3.11.53
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://ip-172-31-35-90.us-west-2.compute.internal:8443
openshift v3.11.53
kubernetes v1.11.0+d4cacc0
An openshift node in my cluster has two GPUs available:
# oc describe node ip-172-31-27-209.us-west-2.compute.internal | egrep 'Capacity|Allocatable|gpu'
openshift.com/gpu-accelerator=true
Capacity:
nvidia.com/gpu: 2
Allocatable:
nvidia.com/gpu: 2
nvidia.com/gpu 0 0
Set a quota of 1 GPU in the namespace "nvidia":
# cat gpu-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: gpu-quota
namespace: nvidia
spec:
hard:
requests.nvidia.com/gpu: 1
Create the quota:
# oc create -f gpu-quota.yaml
resourcequota/gpu-quota created
Verify the namespace has the correct quota set:
# oc describe quota gpu-quota -n nvidia
Name: gpu-quota
Namespace: nvidia
Resource Used Hard
-------- ---- ----
requests.nvidia.com/gpu 0 1
Run a pod that asks for a single GPU:
apiVersion: v1
kind: Pod
metadata:
generateName: gpu-pod-
namespace: nvidia
spec:
restartPolicy: OnFailure
containers:
- name: rhel7-gpu-pod
image: rhel7
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: "compute,utility"
- name: NVIDIA_REQUIRE_CUDA
value: "cuda>=5.0"
command: ["sleep"]
args: ["infinity"]
resources:
limits:
nvidia.com/gpu: 1
# oc create pod gpu-pod.yaml
Verify that it is running:
# oc get pods
NAME READY STATUS RESTARTS AGE
gpu-pod-s46h7 1/1 Running 0 1m
Verify that the quota "Used" counter is correct:
# oc describe quota gpu-quota -n nvidia
Name: gpu-quota
Namespace: nvidia
Resource Used Hard
-------- ---- ----
requests.nvidia.com/gpu 1 1
Now try to create a 2nd GPU pod in the nvidia namespace, which is technically available on the node (it has 2 GPUs):
# oc create -f gpu-pod.yaml
Error from server (Forbidden): error when creating "gpu-pod.yaml": pods "gpu-pod-f7z2w" is forbidden: exceeded quota: gpu-quota, requested: requests.nvidia.com/gpu=1, used: requests.nvidia.com/gpu=1, limited: requests.nvidia.com/gpu=1
This forbidden message is expected, since we have a quota of 1 GPU, and this pod tried to allocate a 2nd GPU which would have exceeded it's quota.
Our quota documentation
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html#managed-by-quota
Is missing this section:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources
But this quota on extended resources like GPUs works fine on OCP 3.11:
An openshift node in my cluster has two GPUs available:
Set a quota of 1 GPU in the namespace "nvidia":
Create the quota:
Verify the namespace has the correct quota set:
Run a pod that asks for a single GPU:
Verify that it is running:
Verify that the quota "Used" counter is correct:
Now try to create a 2nd GPU pod in the nvidia namespace, which is technically available on the node (it has 2 GPUs):
This forbidden message is expected, since we have a quota of 1 GPU, and this pod tried to allocate a 2nd GPU which would have exceeded it's quota.