avoid triggering nightly tests until builds are complete#408
avoid triggering nightly tests until builds are complete#408rapids-bot[bot] merged 14 commits intobranch-25.10from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
…ightly-tests-after-builds
|
If you still see the issue in doc, can you please update the link to https://docs.nvidia.com/ngc/latest/ngc-private-registry-user-guide.html#generating-a-personal-api-key |
|
Ok yep will do! And I think this new error:
Is a result of rapidsai/rmm#2036 It should be fixed by other RAPIDS packages being rebuilt, which @bdice triggered here: https://github.com/rapidsai/workflows/actions/runs/17949845053 |
Looks like that was not enough, probably for the reasons being discussed in rapidsai/build-planning#218 This should hopefully be resolved later today when the RAPIDS Ops team deletes some nightly packages to allow new ones to be published. |
This did fail again 😭 https://github.com/NVIDIA/cuopt/actions/runs/17962030366/job/51090641582?pr=408 Updated those links in the way you suggested: 386dad2 |
|
/merge |
Replaces NVIDIA#359 (my more-complicated earlier attempt at this) This project runs nightly builds and tests on a cron schedule: https://github.com/NVIDIA/cuopt/blob/36a6a1c0edf42cec2cf07c6be3f16531f33515de/.github/workflows/nightly.yaml#L1-L6 Tests need to wait for builds to finish, and that's currently done with some shell scripts that hit the GitHub API, using a mix of `sleep` and polling. This has sometimes resulted in nightly failures (network errors, timeouts, etc.). This PR proposes reducing the risk of such failures by moving that logic into GitHub Actions configuration directly, specifically: * making `build.yaml` trigger `test.yaml` with the GitHub CLI **only after all package builds and publishing have finished** ## Issue Contributes to NVIDIA#122 ## Notes for Reviewers ### How I tested this I manually triggered this run of the "Trigger Nightly cuOpt Pipeline": https://github.com/NVIDIA/cuopt/actions/runs/17935159871 Which triggered this `build` run: https://github.com/NVIDIA/cuopt/actions/runs/17935161536 Which triggered this `test` run: https://github.com/NVIDIA/cuopt/actions/runs/17936474025 Things look ok to me! The `test` run was triggered until after all the relevant package builds and uploads were done, and BEFORE the docker image builds were done (as intended, to not be delayed waiting on them). There are some test failures from artifact-downloading, like this: ```text [rapids-github-run-id] Querying the GitHub API to determine relevant run of 'build.yaml'. Downloading and decompressing cuopt_wheel_python_cuopt_server_cu12_py312_x86_64 from Run ID 17936253863 into /tmp/tmp.pqrBXIhMlP ``` But I think they'll be fixed by merging NVIDIA#409 And the naming changes for the image builds look good 😁 <img width="317" height="203" alt="image" src="https://github.com/user-attachments/assets/31bac7bd-1c4d-4c31-9ce9-9863778c2e89" /> Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Ramakrishnap (https://github.com/rgsl888prabhu) URL: NVIDIA#408
Description
Replaces #359 (my more-complicated earlier attempt at this)
This project runs nightly builds and tests on a cron schedule:
cuopt/.github/workflows/nightly.yaml
Lines 1 to 6 in 36a6a1c
Tests need to wait for builds to finish, and that's currently done with some shell scripts that hit the GitHub API, using a mix of
sleepand polling.This has sometimes resulted in nightly failures (network errors, timeouts, etc.). This PR proposes reducing the risk of such failures by moving that logic into GitHub Actions configuration directly, specifically:
build.yamltriggertest.yamlwith the GitHub CLI only after all package builds and publishing have finishedIssue
Contributes to #122
Notes for Reviewers
How I tested this
I manually triggered this run of the "Trigger Nightly cuOpt Pipeline": https://github.com/NVIDIA/cuopt/actions/runs/17935159871
Which triggered this
buildrun: https://github.com/NVIDIA/cuopt/actions/runs/17935161536Which triggered this
testrun: https://github.com/NVIDIA/cuopt/actions/runs/17936474025Things look ok to me!
The
testrun was triggered until after all the relevant package builds and uploads were done, and BEFORE the docker image builds were done (as intended, to not be delayed waiting on them).There are some test failures from artifact-downloading, like this:
But I think they'll be fixed by merging #409
And the naming changes for the image builds look good 😁