Skip to content

openshift docs missing quota section on extended resources #13447

@jeremyeder

Description

@jeremyeder

Our quota documentation
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html#managed-by-quota

Is missing this section:
https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources

But this quota on extended resources like GPUs works fine on OCP 3.11:

# oc version
oc v3.11.53
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-35-90.us-west-2.compute.internal:8443
openshift v3.11.53
kubernetes v1.11.0+d4cacc0

An openshift node in my cluster has two GPUs available:

# oc describe node ip-172-31-27-209.us-west-2.compute.internal | egrep 'Capacity|Allocatable|gpu'
                    openshift.com/gpu-accelerator=true
Capacity:
 nvidia.com/gpu:  2
Allocatable:
 nvidia.com/gpu:  2
  nvidia.com/gpu  0           0

Set a quota of 1 GPU in the namespace "nvidia":

# cat gpu-quota.yaml 
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-quota
  namespace: nvidia
spec:
  hard:
    requests.nvidia.com/gpu: 1

Create the quota:

# oc create -f gpu-quota.yaml
resourcequota/gpu-quota created

Verify the namespace has the correct quota set:

# oc describe quota gpu-quota -n nvidia
Name:                    gpu-quota
Namespace:               nvidia
Resource                 Used  Hard
--------                 ----  ----
requests.nvidia.com/gpu  0     1

Run a pod that asks for a single GPU:

apiVersion: v1
kind: Pod
metadata:
  generateName: gpu-pod-
  namespace: nvidia
spec:
  restartPolicy: OnFailure
  containers:
  - name: rhel7-gpu-pod
    image: rhel7
    env:
      - name: NVIDIA_VISIBLE_DEVICES
        value: all
      - name: NVIDIA_DRIVER_CAPABILITIES
        value: "compute,utility"
      - name: NVIDIA_REQUIRE_CUDA
        value: "cuda>=5.0"

    command: ["sleep"]
    args: ["infinity"]

    resources:
      limits:
        nvidia.com/gpu: 1
# oc create pod gpu-pod.yaml

Verify that it is running:

# oc get pods
NAME              READY     STATUS      RESTARTS   AGE
gpu-pod-s46h7     1/1       Running     0          1m

Verify that the quota "Used" counter is correct:

# oc describe quota gpu-quota -n nvidia
Name:                    gpu-quota
Namespace:               nvidia
Resource                 Used  Hard
--------                 ----  ----
requests.nvidia.com/gpu  1     1

Now try to create a 2nd GPU pod in the nvidia namespace, which is technically available on the node (it has 2 GPUs):

# oc create -f gpu-pod.yaml 
Error from server (Forbidden): error when creating "gpu-pod.yaml": pods "gpu-pod-f7z2w" is forbidden: exceeded quota: gpu-quota, requested: requests.nvidia.com/gpu=1, used: requests.nvidia.com/gpu=1, limited: requests.nvidia.com/gpu=1

This forbidden message is expected, since we have a quota of 1 GPU, and this pod tried to allocate a 2nd GPU which would have exceeded it's quota.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions