Skip to content
This repository was archived by the owner on May 6, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/demo_nvidia_dranet/deviceclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ metadata:
spec:
selectors:
- cel:
expression: device.driver == "dra.net"
expression: device.driver == "dra.net"
25 changes: 5 additions & 20 deletions examples/demo_nvidia_dranet/resourceclaims.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
name: 2-gpu
name: 2-gpu-nic-aligned
spec:
spec:
devices:
Expand All @@ -25,26 +24,12 @@ spec:
count: 2
selectors:
- cel:
expression: |
device.attributes["gpu.nvidia.com"].index < 2

---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
name: 2-nic
spec:
spec:
devices:
requests:
expression: device.attributes["gpu.nvidia.com"].index <= 2
- name: nic
deviceClassName: dranet
count: 2
selectors:
- cel:
expression: device.attributes["dra.net"].rdma == true &&
(
(device.attributes["dra.net"].ifName.startsWith("gpu") &&
device.attributes["dra.net"].ifName.endsWith("rdma0") &&
int(device.attributes["dra.net"].ifName.substring(3, 4)) < 2)
)
expression: device.attributes["dra.net"].rdma == true
constraints:
- matchAttribute: "resource.kubernetes.io/pcieRoot"
35 changes: 14 additions & 21 deletions examples/demo_nvidia_dranet/statefulset.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Service
metadata:
name: nccl-gib-test
spec:
selector:
name: nccl-gib-test
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
Expand All @@ -36,8 +31,8 @@ spec:
capabilities:
add: ["IPC_LOCK"]
volumeMounts:
# - name: library-dir-host
# mountPath: /usr/local/nvidia
- name: library-dir-host
mountPath: /usr/local/nvidia
- name: gib
mountPath: /usr/local/gib
- name: shared-memory
Expand All @@ -57,7 +52,7 @@ spec:
sleep infinity
resources:
claims:
- name: gpu
- name: gpu
volumes:
- name: library-dir-host
hostPath:
Expand All @@ -71,11 +66,9 @@ spec:
sizeLimit: 250Gi
resourceClaims:
- name: gpu
resourceClaimTemplateName: 2-gpu
- name: nic
resourceClaimTemplateName: 2-nic
resourceClaimTemplateName: 2-gpu-nic-aligned
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
value: "present"
effect: "NoSchedule"
effect: "NoSchedule"
15 changes: 13 additions & 2 deletions site/content/docs/concepts/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,21 @@ date: 2024-12-19T11:20:46Z
---

- [The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking](/docs/kubernetes_network_driver_model_dranet_paper.pdf) - This paper introduces the Kubernetes Network Driver model and provides a detailed performance evaluation of DraNet, demonstrating significant bandwidth improvements for AI/ML workloads.

{{< embed-pdf "/docs/kubernetes_network_driver_model_dranet_paper.pdf" >}}

- [The Challenges of AI/ML Multi-Node Workloads in Kubernetes - Antonio Ojea, Google - Regular SIG Network Meeting for 2025-07-17](https://www.youtube.com/playlist?list=PL69nYSiGNLP2E8vmnqo5MwPOY25sDWIxb)

<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSBButnm46ReLtbtgBa2b4xkmr3oXEtH5yf10xsQ4fjcqF4jSOc5MzeZQUS02Ev2j6DKFj8vQAjCIoy/pubembed?start=true&loop=true&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

- [Kubernetes Network Drivers, Antonio Ojea, Presentation](https://docs.google.com/presentation/d/1Vdr7BhbYXeWjwmLjGmqnUkvJr_eOUdU0x-JxfXWxUT8/edit?usp=sharing)
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRVritcaQFYkvaPuTPsxkgOt0ZfWhqYPcCjNN0UgZcEh9HR1yh3bFDXSOiPbPUayoMzbefZ_qvFoWCX/pubembed?start=true&loop=true&delayms=3000" frameborder="0" width="480" height="299" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

- [KEP 3063 - Dynamic Resource Allocation #306](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)
- [KEP 3695 - DRA: structured parameters #438](https://github.com/kubernetes/enhancements/issues/4381)
- [Extend PodResources to include resources from Dynamic Resource Allocation (DRA)](https://github.com/kubernetes/enhancements/issues/3695)
- [Working Group Device Management](https://github.com/kubernetes-sigs/wg-device-management)
- [Kubernetes Network Drivers, Antonio Ojea, Presentation](https://docs.google.com/presentation/d/1Vdr7BhbYXeWjwmLjGmqnUkvJr_eOUdU0x-JxfXWxUT8/edit?usp=sharing)


- [The Future of Kubernetes Networking - Antonio Ojea, Googe & Dan Winship, Red Hat - Kubernetes Contributor Summit EU 2024](https://sched.co/1aOqO)
- [Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric, Google & Patrick Ohly, Intel - Kubecon US 2024](https://sched.co/1i7pv)
- [Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric, Google & Patrick Ohly, Intel - Kubecon US 2024](https://sched.co/1i7pv)
Loading
Loading