remove delete-temp-images job#709
remove delete-temp-images job#709rapids-bot[bot] merged 2 commits intorapidsai:branch-24.10from jameslamb:stop-deleting-temp-images
Conversation
|
Those are both tested in |
|
@jameslamb I am checking cuSpatial's CI with this PR: rapidsai/cuspatial#1441 (comment) |
|
Great, thanks for pushing this forward @bdice . Looks like CI there did fail on these same notebooks (build link), and that rapidsai/cuspatial#1442 is up to hopefully fix it. |
|
rapidsai/cuspatial#1442 has been merged so hopefully a rerun of this PR should pass the CI now |
|
Thanks as always @mroeschke ! I just restarted all CI jobs here (assuming we need to do rebuilds to pull in the newest |
|
@jameslamb There was a solver error so I dropped Python 3.9 to make CI pass. See b6e37e3. |
|
Ok sure, works for me. Thank you! |
jakirkham
left a comment
There was a problem hiding this comment.
Thanks James! 🙏
Also thanks everyone who reviewed and pushed fixes here and elsewhere! 🙏
|
Given the set of approvals and passing jobs, I think this is ready to merge. Thanks yall. |
|
/merge |
|
Thanks James and everyone for reviewing! 🙏 |
Follow-up to #708.
Proposes completely removing the
delete-temp-imagesjob, in favor of relying on the scheduled nightly cleanup at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.Notes for Reviewers
Details
CI here writes images to the
rapidsai/stagingrepo on DockerHub, then later copies them to individual user-facing repos.To avoid those temporary CI artifacts piling up in the
rapidsai/stagingrepo, pull requests and branch builds run a job calleddelete-temp-imageswhich does what it sounds like.In exchange for more aggressive cleaning, this job introduces significant complexity for development here. Most notably, we've observed several instances where that job deletes images before all CI jobs needing them have completed successfully, leading to all of CI needing to be re-run.
Significant effort has been put into trying to avoid that, and we've found it's been difficult to get it right:
some attempts:
a recent example:
Ok so how will we clean up?
The workflow at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.
It runs once a day and deletes anything from
rapidsai/stagingthat's more than 30 days old.Benefits of these changes
As described in #708 (comment) ...
CI here will work as it does in other RAPIDS repos.... if any jobs fail for retryable reasons (like network issues), you can safely click "re-run failed jobs" and make incremental progress towards all builds passing.
Also reduces the need to maintain code that has to keep up with the DockerHub API in two places (by deleting
ci/delete-temp-images.shhere).