Conversation
jakirkham
left a comment
There was a problem hiding this comment.
Thanks Bradley! 🙏
Had a couple questions about hardware we test on. Otherwise this looks good
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'h100', DRIVER: 'latest', DEPENDENCIES: 'latest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '12.0.1', LINUX_VER: 'rockylinux8', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.10', CUDA_VER: '12.2.2', LINUX_VER: 'rockylinux8', GPU: 'l4', DRIVER: 'earliest', DEPENDENCIES: 'oldest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } |
There was a problem hiding this comment.
Do we want one of these to be an H100 (as was the case before)?
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } | |
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'h100', DRIVER: 'latest', DEPENDENCIES: 'latest' } |
There was a problem hiding this comment.
Previously we had 3 L4 jobs and 2 H100 jobs.
We have much more L4 capacity than H100 capacity, so I decided to use 3 L4 jobs and 1 H100 job instead of 2 L4 jobs and 2 H100 jobs. We do have job queues on H100 nodes fairly often, so this balance should be a better outcome.
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'h100', DRIVER: 'latest', DEPENDENCIES: 'latest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.12', CUDA_VER: '12.0.1', LINUX_VER: 'rockylinux8', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.10', CUDA_VER: '12.2.2', LINUX_VER: 'rockylinux8', GPU: 'l4', DRIVER: 'earliest', DEPENDENCIES: 'oldest' } | ||
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } |
There was a problem hiding this comment.
Also curious about whether we want to use h100 here
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'l4', DRIVER: 'latest', DEPENDENCIES: 'latest' } | |
| - { ARCH: 'amd64', PY_VER: '3.11', CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', GPU: 'h100', DRIVER: 'latest', DEPENDENCIES: 'latest' } |
|
Testing in cuDF before merging: rapidsai/cudf#20158 |
|
Tests passed downstream (with some known unrelated failures). Merging. |
This drops CUDA 12.0 from the CI matrix.
xref: rapidsai/build-planning#223