Skip to content

CI: Detect changed paths and gate cuda.bindings tests#1908

Merged
cpcloud merged 7 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299
Apr 17, 2026
Merged

CI: Detect changed paths and gate cuda.bindings tests#1908
cpcloud merged 7 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Adds a detect-changes job to ci.yml using native git merge-base + git diff --name-only to classify which top-level modules changed in a PR
  • Composes gating outputs respecting the dependency graph: cuda_pathfinder → cuda_bindings → cuda_core
  • Threads skip-bindings-test input through test-wheel-linux.yml and test-wheel-windows.yml so cuda.bindings tests are skipped when only unrelated modules changed
  • Splits the overloaded SKIP_CUDA_BINDINGS_TEST env var into two orthogonal flags: USE_BACKPORT_BINDINGS (artifact fetch for CTK major mismatch) and SKIP_CUDA_BINDINGS_TEST (test-time gate)
  • Non-PR events (push to main, tag, schedule) unconditionally run everything
  • Build-side gating outputs are emitted but not yet consumed (follow-up PR)

Closes #299

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

cpcloud and others added 5 commits April 14, 2026 18:15
Add a detect-changes job to ci.yml that classifies which top-level
modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder,
cuda_python, cuda_python_test_helpers, shared infra) using
dorny/paths-filter. The job emits composed gating outputs that account
for the dependency graph (pathfinder -> bindings -> core).

Thread a new skip-bindings-test input through the reusable test-wheel
workflows so cuda.bindings tests (and their Cython counterparts) are
skipped when the detect-changes output for test_bindings is false. For
PRs that only touch cuda_core this skips the expensive bindings suite
while still running cuda.core and cuda.pathfinder tests against the
wheel built in the current CI run.

Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars
into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport
branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST
remains the test-time gate. This lets path-filter-based skips reuse the
existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a
cross-branch artifact fetch.

Non-PR events (push to main, tag, schedule, workflow_dispatch) still
exercise the full pipeline.

Refs NVIDIA#299
Two fixes from code review:

1. Set `base: main` on dorny/paths-filter so backport PRs targeting
   non-default branches still diff against main for path detection.

2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The
   path-filter override only skips bindings tests; cuda.core Cython
   tests should still run on core-only PRs. Cython skip is now driven
   solely by CTK minor-version mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dorny/paths-filter v3 already diffs against the repo default branch
for non-default-branch pushes. Explicit base: main was redundant and
would produce wrong baselines for backport PRs targeting release
branches (all files seen as changed vs main).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove third-party dorny/paths-filter action dependency. Use
git merge-base + git diff --name-only + grep instead. Same
behavior, zero supply-chain risk, full control over base ref
resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the ci-detect-changes-299 branch from 67f8baf to d205955 Compare April 14, 2026 22:15
Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated with Cursor GPT-5.4 Extra High Fast


Reviewed the final tree for PR 1908 at commit d205955

actionlint passed on the touched workflows, so the notes below are about CI behavior / control flow, not YAML syntax.

1. High: detect-changes can fail without clearly failing the final required status

Evidence

  • All three test jobs now depend on detect-changes:
    • .github/workflows/ci.yml:284
    • .github/workflows/ci.yml:309
    • .github/workflows/ci.yml:334
  • The terminal checks job does not depend on detect-changes directly and only inspects:
    • doc
    • test-linux-64
    • test-linux-aarch64
    • test-windows
      at .github/workflows/ci.yml:365
  • The shell logic in checks only treats failure and cancelled as errors:
    • .github/workflows/ci.yml:394
    • .github/workflows/ci.yml:398

Why this matters

  • If detect-changes fails, the downstream test jobs can become skipped because a required dependency did not complete successfully.
  • In that state, checks may still exit 0 because it does not treat those skipped results as an error.
  • Result: a broken path-detection step could suppress or reduce CI coverage without producing a clearly failing final status.

Agent hint

  • Re-check GitHub Actions needs semantics for always() plus an upstream failed prerequisite leading to downstream skipped.
  • The most direct fix is probably one of:
    • add detect-changes to checks.needs and explicitly require needs.detect-changes.result == 'success'
    • or treat unexpected skipped test jobs as fatal when doc_only != true
  • Preserve the intended [doc-only] skip behavior while making accidental skips fail closed.

2. Medium: diff baseline is hardcoded to origin/main

Evidence

  • detect-changes computes the merge base against origin/main at .github/workflows/ci.yml:118.
  • The repo also maintains a release/backport branch in ci/versions.yml:4:
    • backport_branch: "12.9.x"

Why this matters

  • For PR 1908, whose base is main, this is fine.
  • If the same pull-request/* CI path is ever used for PRs targeting 12.9.x or another non-main branch, changed-path classification will be wrong.
  • Wrong classification can suppress outputs such as test_bindings, which means under-testing on backport / release PRs.

Agent hint

  • First confirm whether pull-request/* branches are ever created for non-main PR bases in this repo.
  • If yes, derive the true PR base branch before git merge-base instead of assuming main.
  • Because this workflow runs on pushes to pull-request/*, github.base_ref may not be available; one workable approach is to resolve the PR number from github.ref_name and query gh pr view --json baseRefName.
  • If non-main PR bases are impossible by construction, add a short comment documenting that invariant so future readers do not "fix" it incorrectly.
  • This area already had churn within the PR history, so the next agent should validate the actual branch-generation semantics before changing behavior.

Bottom line

  • I did not find a concrete bug in the USE_BACKPORT_BINDINGS / SKIP_CUDA_BINDINGS_TEST split itself.
  • The two findings above are the only items that looked important enough to hand back to the PR author.

cpcloud added 2 commits April 16, 2026 12:49
copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless
of whether the upstream PR targets main or a backport branch such as
12.9.x. Diffing against origin/main unconditionally therefore
misclassifies changed paths on backport PRs and silently suppresses
the cuda.bindings test matrix.

Look up the real base ref via nv-gha-runners/get-pr-info (already used
elsewhere in this repo) and pass it to git merge-base so the changed-
paths classification matches the PR's actual target.
The checks job used if:always() plus shell-level result inspection of
doc and the three test jobs, treating skipped as non-fatal to preserve
intentional [doc-only] skips. detect-changes is a needs prerequisite
of every test job but was absent from checks.needs, so a failure in
the gating step silently cascaded into test-job skips and a green
final status.

Add detect-changes to checks.needs and require its result to be
success. The legitimate doc-only skip path is unaffected because that
leaves detect-changes itself successful.
@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Apr 16, 2026

Cursor GPT-5.4 Extra High Fast generated:


Reviewed the current final tree for PR 1908 at commit c11f901.

Findings

1. Medium: detect-changes can under-classify cross-package renames

Evidence

  • .github/workflows/ci.yml:134 builds the changed-path list with:
    • git diff --name-only "$base"...HEAD
  • .github/workflows/ci.yml:141 through .github/workflows/ci.yml:146 classify changed modules only from those reported paths.
  • The resulting test_bindings output then drives:
    • .github/workflows/ci.yml:308
    • .github/workflows/ci.yml:333
    • .github/workflows/ci.yml:358

Why this matters

  • In a pure rename or move, git diff --name-only can report only the destination path.
  • That means a refactor like cuda_bindings/... -> cuda_core/... can produce bindings=false even though the bindings package lost code.
  • In that case, skip-bindings-test may become true and reduce CI coverage for the PR.
  • The same shape of issue can apply to moves across shared prefixes.

Agent hints

  • The simplest fail-closed fix is probably:
    • use git diff --no-renames --name-only "$base"...HEAD
    • this reports both old and new paths for renames/moves, which is better for path-based gating
  • A more explicit alternative is:
    • use git diff --name-status -M "$base"...HEAD
    • when status is R*, inspect both the old path and the new path
  • If the PR author prefers to keep the current implementation for now, at least add a short comment documenting the rename blind spot so future readers do not assume the gating is exact.
  • A focused regression check would be valuable here, even if it is just a small shell-based test or script fixture for the detect-changes truth table.

Notes

  • The two earlier findings from my first review look addressed in the current branch.
  • Base-branch resolution now uses PR metadata via nv-gha-runners/get-pr-info in .github/workflows/ci.yml:115 through .github/workflows/ci.yml:124, which looks appropriate for pull-request/<N> refs that may target non-main branches.
  • The final checks job now includes detect-changes in needs at .github/workflows/ci.yml:382 and explicitly requires it to succeed at .github/workflows/ci.yml:416, so a broken gating step should no longer hide behind skipped downstream jobs.
  • I did not find a new issue in the USE_BACKPORT_BINDINGS / SKIP_CUDA_BINDINGS_TEST split; that part still looks coherent.
  • Residual caution: this is still a static CI review. I do not see a dedicated automated check for the detect-changes truth table, so future refactors could regress rename/path handling fairly easily.

Bottom line

  • I see one remaining medium-confidence CI edge case around rename handling.
  • Otherwise, the previously reported issues appear fixed.

Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to merge-as-is and work on the

detect-changes can under-classify cross-package renames

finding in a follow-on PR. I'm fine either way.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 17, 2026

Thanks @rwgk — both findings addressed:

Finding 1 (High) — detect-changes could silently cascade into skipped tests. Added detect-changes to checks.needs with an explicit needs.detect-changes.result == 'success' guard, so a gating-step failure now fails the final status rather than producing a green run with reduced coverage. Commit c11f901. The legitimate [doc-only] skip path is unaffected (detect-changes stays success in that case).

Finding 2 (Medium) — diff baseline hardcoded to origin/main. Confirmed via copy-pr-bot config that pull-request/<N> branches are populated regardless of the PR's base (including 12.9.x), so the hardcode was indeed wrong for backport PRs. Resolved the real base at runtime via nv-gha-runners/get-pr-info@main (already used elsewhere in this repo) and pass it to git merge-base. Fails loudly if the base can't be resolved rather than silently falling back. Commit f410ba6.

@cpcloud cpcloud merged commit 7ab4682 into NVIDIA:main Apr 17, 2026
158 of 159 checks passed
@cpcloud cpcloud deleted the ci-detect-changes-299 branch April 17, 2026 13:04
@leofang
Copy link
Copy Markdown
Member

leofang commented Apr 17, 2026

Thanks, @cpcloud @rwgk! I noticed this fails on the main branch, could you take a look?
https://github.com/NVIDIA/cuda-python/actions/runs/24566662170/job/71828554003

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Apr 17, 2026

It's failing in the last line here:

      - name: Detect changed paths
        id: filter
        if: ${{ startsWith(github.ref_name, 'pull-request/') }}
        env:
          BASE_REF: ${{ fromJSON(steps.pr-info.outputs.pr-info).base.ref }}

Maybe the env is evaluated even if the if evaluates to false?

github-actions Bot pushed a commit that referenced this pull request Apr 18, 2026
Removed preview folders for the following PRs:
- PR #1593
- PR #1908
- PR #1923
- PR #1925
- PR #1933
- PR #1934
- PR #1938
- PR #1939
mdboom pushed a commit to mdboom/cuda-python that referenced this pull request Apr 20, 2026
* ci: detect changed paths and gate cuda.bindings tests

Add a detect-changes job to ci.yml that classifies which top-level
modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder,
cuda_python, cuda_python_test_helpers, shared infra) using
dorny/paths-filter. The job emits composed gating outputs that account
for the dependency graph (pathfinder -> bindings -> core).

Thread a new skip-bindings-test input through the reusable test-wheel
workflows so cuda.bindings tests (and their Cython counterparts) are
skipped when the detect-changes output for test_bindings is false. For
PRs that only touch cuda_core this skips the expensive bindings suite
while still running cuda.core and cuda.pathfinder tests against the
wheel built in the current CI run.

Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars
into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport
branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST
remains the test-time gate. This lets path-filter-based skips reuse the
existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a
cross-branch artifact fetch.

Non-PR events (push to main, tag, schedule, workflow_dispatch) still
exercise the full pipeline.

Refs NVIDIA#299

* fix: set explicit dorny base and decouple cython-test skip

Two fixes from code review:

1. Set `base: main` on dorny/paths-filter so backport PRs targeting
   non-default branches still diff against main for path detection.

2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The
   path-filter override only skips bindings tests; cuda.core Cython
   tests should still run on core-only PRs. Cython skip is now driven
   solely by CTK minor-version mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove explicit dorny base, rely on v3 default

dorny/paths-filter v3 already diffs against the repo default branch
for non-default-branch pushes. Explicit base: main was redundant and
would produce wrong baselines for backport PRs targeting release
branches (all files seen as changed vs main).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace dorny/paths-filter with native git diff

Remove third-party dorny/paths-filter action dependency. Use
git merge-base + git diff --name-only + grep instead. Same
behavior, zero supply-chain risk, full control over base ref
resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: use herestring instead of echo pipe in has_match

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: resolve PR base branch at runtime for detect-changes diff

copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless
of whether the upstream PR targets main or a backport branch such as
12.9.x. Diffing against origin/main unconditionally therefore
misclassifies changed paths on backport PRs and silently suppresses
the cuda.bindings test matrix.

Look up the real base ref via nv-gha-runners/get-pr-info (already used
elsewhere in this repo) and pass it to git merge-base so the changed-
paths classification matches the PR's actual target.

* ci: require detect-changes success in final checks aggregation

The checks job used if:always() plus shell-level result inspection of
doc and the three test jobs, treating skipped as non-fatal to preserve
intentional [doc-only] skips. detect-changes is a needs prerequisite
of every test job but was absent from checks.needs, so a failure in
the gating step silently cascaded into test-job skips and a green
final status.

Add detect-changes to checks.needs and require its result to be
success. The legitimate doc-only skip path is unaffected because that
leaves detect-changes itself successful.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Detect if build/test is needed

3 participants