CI: Detect changed paths and gate cuda.bindings tests#1908
CI: Detect changed paths and gate cuda.bindings tests#1908cpcloud merged 7 commits intoNVIDIA:mainfrom
Conversation
|
Add a detect-changes job to ci.yml that classifies which top-level modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder, cuda_python, cuda_python_test_helpers, shared infra) using dorny/paths-filter. The job emits composed gating outputs that account for the dependency graph (pathfinder -> bindings -> core). Thread a new skip-bindings-test input through the reusable test-wheel workflows so cuda.bindings tests (and their Cython counterparts) are skipped when the detect-changes output for test_bindings is false. For PRs that only touch cuda_core this skips the expensive bindings suite while still running cuda.core and cuda.pathfinder tests against the wheel built in the current CI run. Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST remains the test-time gate. This lets path-filter-based skips reuse the existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a cross-branch artifact fetch. Non-PR events (push to main, tag, schedule, workflow_dispatch) still exercise the full pipeline. Refs NVIDIA#299
Two fixes from code review: 1. Set `base: main` on dorny/paths-filter so backport PRs targeting non-default branches still diff against main for path detection. 2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The path-filter override only skips bindings tests; cuda.core Cython tests should still run on core-only PRs. Cython skip is now driven solely by CTK minor-version mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dorny/paths-filter v3 already diffs against the repo default branch for non-default-branch pushes. Explicit base: main was redundant and would produce wrong baselines for backport PRs targeting release branches (all files seen as changed vs main). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove third-party dorny/paths-filter action dependency. Use git merge-base + git diff --name-only + grep instead. Same behavior, zero supply-chain risk, full control over base ref resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
67f8baf to
d205955
Compare
rwgk
left a comment
There was a problem hiding this comment.
Generated with Cursor GPT-5.4 Extra High Fast
Reviewed the final tree for PR 1908 at commit d205955
actionlint passed on the touched workflows, so the notes below are about CI behavior / control flow, not YAML syntax.
1. High: detect-changes can fail without clearly failing the final required status
Evidence
- All three test jobs now depend on
detect-changes:.github/workflows/ci.yml:284.github/workflows/ci.yml:309.github/workflows/ci.yml:334
- The terminal
checksjob does not depend ondetect-changesdirectly and only inspects:doctest-linux-64test-linux-aarch64test-windows
at.github/workflows/ci.yml:365
- The shell logic in
checksonly treatsfailureandcancelledas errors:.github/workflows/ci.yml:394.github/workflows/ci.yml:398
Why this matters
- If
detect-changesfails, the downstream test jobs can becomeskippedbecause a required dependency did not complete successfully. - In that state,
checksmay still exit 0 because it does not treat thoseskippedresults as an error. - Result: a broken path-detection step could suppress or reduce CI coverage without producing a clearly failing final status.
Agent hint
- Re-check GitHub Actions
needssemantics foralways()plus an upstream failed prerequisite leading to downstreamskipped. - The most direct fix is probably one of:
- add
detect-changestochecks.needsand explicitly requireneeds.detect-changes.result == 'success' - or treat unexpected
skippedtest jobs as fatal whendoc_only != true
- add
- Preserve the intended
[doc-only]skip behavior while making accidental skips fail closed.
2. Medium: diff baseline is hardcoded to origin/main
Evidence
detect-changescomputes the merge base againstorigin/mainat.github/workflows/ci.yml:118.- The repo also maintains a release/backport branch in
ci/versions.yml:4:backport_branch: "12.9.x"
Why this matters
- For PR 1908, whose base is
main, this is fine. - If the same
pull-request/*CI path is ever used for PRs targeting12.9.xor another non-mainbranch, changed-path classification will be wrong. - Wrong classification can suppress outputs such as
test_bindings, which means under-testing on backport / release PRs.
Agent hint
- First confirm whether
pull-request/*branches are ever created for non-mainPR bases in this repo. - If yes, derive the true PR base branch before
git merge-baseinstead of assumingmain. - Because this workflow runs on pushes to
pull-request/*,github.base_refmay not be available; one workable approach is to resolve the PR number fromgithub.ref_nameand querygh pr view --json baseRefName. - If non-
mainPR bases are impossible by construction, add a short comment documenting that invariant so future readers do not "fix" it incorrectly. - This area already had churn within the PR history, so the next agent should validate the actual branch-generation semantics before changing behavior.
Bottom line
- I did not find a concrete bug in the
USE_BACKPORT_BINDINGS/SKIP_CUDA_BINDINGS_TESTsplit itself. - The two findings above are the only items that looked important enough to hand back to the PR author.
copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless of whether the upstream PR targets main or a backport branch such as 12.9.x. Diffing against origin/main unconditionally therefore misclassifies changed paths on backport PRs and silently suppresses the cuda.bindings test matrix. Look up the real base ref via nv-gha-runners/get-pr-info (already used elsewhere in this repo) and pass it to git merge-base so the changed- paths classification matches the PR's actual target.
The checks job used if:always() plus shell-level result inspection of doc and the three test jobs, treating skipped as non-fatal to preserve intentional [doc-only] skips. detect-changes is a needs prerequisite of every test job but was absent from checks.needs, so a failure in the gating step silently cascaded into test-job skips and a green final status. Add detect-changes to checks.needs and require its result to be success. The legitimate doc-only skip path is unaffected because that leaves detect-changes itself successful.
|
Cursor GPT-5.4 Extra High Fast generated: Reviewed the current final tree for PR 1908 at commit c11f901. Findings1. Medium:
|
rwgk
left a comment
There was a problem hiding this comment.
I think this is fine to merge-as-is and work on the
detect-changes can under-classify cross-package renames
finding in a follow-on PR. I'm fine either way.
|
Thanks @rwgk — both findings addressed: Finding 1 (High) — Finding 2 (Medium) — diff baseline hardcoded to |
|
Thanks, @cpcloud @rwgk! I noticed this fails on the main branch, could you take a look? |
|
It's failing in the last line here: Maybe the |
* ci: detect changed paths and gate cuda.bindings tests Add a detect-changes job to ci.yml that classifies which top-level modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder, cuda_python, cuda_python_test_helpers, shared infra) using dorny/paths-filter. The job emits composed gating outputs that account for the dependency graph (pathfinder -> bindings -> core). Thread a new skip-bindings-test input through the reusable test-wheel workflows so cuda.bindings tests (and their Cython counterparts) are skipped when the detect-changes output for test_bindings is false. For PRs that only touch cuda_core this skips the expensive bindings suite while still running cuda.core and cuda.pathfinder tests against the wheel built in the current CI run. Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST remains the test-time gate. This lets path-filter-based skips reuse the existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a cross-branch artifact fetch. Non-PR events (push to main, tag, schedule, workflow_dispatch) still exercise the full pipeline. Refs NVIDIA#299 * fix: set explicit dorny base and decouple cython-test skip Two fixes from code review: 1. Set `base: main` on dorny/paths-filter so backport PRs targeting non-default branches still diff against main for path detection. 2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The path-filter override only skips bindings tests; cuda.core Cython tests should still run on core-only PRs. Cython skip is now driven solely by CTK minor-version mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove explicit dorny base, rely on v3 default dorny/paths-filter v3 already diffs against the repo default branch for non-default-branch pushes. Explicit base: main was redundant and would produce wrong baselines for backport PRs targeting release branches (all files seen as changed vs main). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace dorny/paths-filter with native git diff Remove third-party dorny/paths-filter action dependency. Use git merge-base + git diff --name-only + grep instead. Same behavior, zero supply-chain risk, full control over base ref resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: use herestring instead of echo pipe in has_match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: resolve PR base branch at runtime for detect-changes diff copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless of whether the upstream PR targets main or a backport branch such as 12.9.x. Diffing against origin/main unconditionally therefore misclassifies changed paths on backport PRs and silently suppresses the cuda.bindings test matrix. Look up the real base ref via nv-gha-runners/get-pr-info (already used elsewhere in this repo) and pass it to git merge-base so the changed- paths classification matches the PR's actual target. * ci: require detect-changes success in final checks aggregation The checks job used if:always() plus shell-level result inspection of doc and the three test jobs, treating skipped as non-fatal to preserve intentional [doc-only] skips. detect-changes is a needs prerequisite of every test job but was absent from checks.needs, so a failure in the gating step silently cascaded into test-job skips and a green final status. Add detect-changes to checks.needs and require its result to be success. The legitimate doc-only skip path is unaffected because that leaves detect-changes itself successful. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
detect-changesjob toci.ymlusing nativegit merge-base+git diff --name-onlyto classify which top-level modules changed in a PRcuda_pathfinder → cuda_bindings → cuda_coreskip-bindings-testinput throughtest-wheel-linux.ymlandtest-wheel-windows.ymlso cuda.bindings tests are skipped when only unrelated modules changedSKIP_CUDA_BINDINGS_TESTenv var into two orthogonal flags:USE_BACKPORT_BINDINGS(artifact fetch for CTK major mismatch) andSKIP_CUDA_BINDINGS_TEST(test-time gate)Closes #299
🤖 Generated with Claude Code