Drop CUDA 12.2 Docker images#696
Conversation
jameslamb
left a comment
There was a problem hiding this comment.
Awesome, thanks for getting that deprecation notice into the 24.08 release.
|
Thanks Bradley and James! 🙏 |
|
I just noticed that tests are not running here on PRs any more. https://github.com/rapidsai/docker/actions/runs/10392124806?pr=696
I missed that on #702, and I can put up a PR right now to fix that. It shouldn't block this particular PR though, as this is just deleting things. |
Follow-up to #702 and #693. Created based on #696 (comment) `test` jobs are not currently running on pull requests here, because they require `build-multiarch-manifest` jobs, which have this condition that causes such jobs to be skipped on PR builds: https://github.com/rapidsai/docker/blob/1c27d9245fd9d99ee35981b970acaf10961ca45b/.github/workflows/build-test-publish-images.yml#L171-L172 This PR ensures that `test` jobs always run on PRs, and that merging is blocked until they succeed. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Ray Douglass (https://github.com/raydouglass) URL: #708
|
@jameslamb There is an error in the CI that I am not expecting. Can you take a look? Seems related to the recent CI changes. |
|
Ok I was confused by the mix of results at first because some jobs were re-run and some weren't. Looking at just the first run, I see what happened: https://github.com/rapidsai/docker/actions/runs/10477159554/attempts/1?pr=696
Some This is why I want to just remove This complexity of handling the difference between "failed" and "skipped" is just not worth it here, in my opinion, when we already have another mechanism for cleaning up old stuff. I'm going to put up a PR proposing that. For now, for you here, you'll need to do a full re-run (or just wait for that other PR first). |
Follow-up to #708. Proposes completely removing the `delete-temp-images` job, in favor of relying on the scheduled nightly cleanup at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml. ## Notes for Reviewers ### Details CI here writes images to the `rapidsai/staging` repo on DockerHub, then later copies them to individual user-facing repos. To avoid those temporary CI artifacts piling up in the `rapidsai/staging` repo, pull requests and branch builds run a job called `delete-temp-images` which does what it sounds like. In exchange for more aggressive cleaning, this job introduces significant complexity for development here. Most notably, we've observed several instances where that job deletes images before all CI jobs needing them have completed successfully, leading to all of CI needing to be re-run. Significant effort has been put into trying to avoid that, and we've found it's been difficult to get it right: some attempts: * #702 * #708 a recent example: * #696 (comment) ### Ok so how will we clean up? The workflow at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml. It runs once a day and deletes anything from `rapidsai/staging` that's more than 30 days old. ### Benefits of these changes As described in #708 (comment) ... CI here will work as it does in other RAPIDS repos.... if any jobs fail for retryable reasons (like network issues), you can safely click "re-run failed jobs" and make incremental progress towards all builds passing. Also reduces the need to maintain code that has to keep up with the DockerHub API in two places (by deleting `ci/delete-temp-images.sh` here). Authors: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) - https://github.com/jakirkham URL: #709
|
I've merged latest If any builds fail with temporary issues like network errors, just re-running those failed jobs should be safe and successful. |
|
most build jobs here were failing with what look like temporary errors. I've restarted them. |
|
/merge |
|
Thanks all! 🙏 |


Following rapidsai/docs#526, we can remove CUDA 12.2 from the RAPIDS 24.10 Docker images.