Skip to content

fix(ci): replace push path filters with runtime path gate in GPU workflows#558

Merged
mchmarny merged 7 commits intoNVIDIA:mainfrom
yuanchen8911:fix/gpu-workflow-path-gate
Apr 14, 2026
Merged

fix(ci): replace push path filters with runtime path gate in GPU workflows#558
mchmarny merged 7 commits intoNVIDIA:mainfrom
yuanchen8911:fix/gpu-workflow-path-gate

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Apr 13, 2026

Summary

Replace on.push.paths filters with a runtime check-paths gate job in all four GPU workflows. The gate uses dorny/paths-filter with base: main to evaluate the actual PR diff, making it the single source of truth for path-based GPU test gating.

This also isolates manual workflow_dispatch runs from push concurrency so unrelated PR updates do not cancel in-flight manual GPU validations.

Motivation / Context

GPU workflows trigger on push to pull-request/* branches (created by copy-pr-bot). The existing on.push.paths filters compare against the previous push SHA, not the merge base, causing two failure modes:

  1. False positives: rebasing onto main includes merged changes in the push diff, triggering GPU tests for unrelated PRs (observed on PR fix(ci): auto-label new issues by area and assign owners #535 after rebasing onto main which included fix(ci): improve GPU test reliability and deploy timeout handling #539's GPU workflow changes).
  2. False negatives: a subsequent push touching only unrelated files prevents the workflow from starting, even when the PR still contains GPU-relevant changes.

Broadening push to all pull-request/* updates also means workflow-level concurrency must be event-aware, otherwise an unrelated push run can cancel an in-flight manual workflow_dispatch run before check-paths gets a chance to skip the GPU job.

Fixes: N/A
Related: #553, #554

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: GPU CI workflows

Implementation Notes

  • Removes on.push.paths from all four GPU workflows — the push trigger now fires on any push to pull-request/* branches
  • Adds a check-paths gate job to each workflow using dorny/paths-filter@v3.0.2 with base: main
  • check-paths is the single source of truth for path filtering — no duplicate path lists between trigger-level and job-level
  • The gate only runs for push events (~10s on ubuntu-latest)
  • schedule, workflow_dispatch, and pull_request (labeled) events bypass the gate and run unconditionally
  • Concurrency groups now include github.event_name so manual workflow_dispatch runs do not share cancellation state with push runs on the same PR branch
Event check-paths GPU test
schedule Skipped Runs
workflow_dispatch Skipped Runs
pull_request + run-gpu-tests label Skipped Runs
push + relevant paths changed vs main Runs -> true Runs
push + no relevant paths vs main Runs -> false Skipped

Testing

unset GITLAB_TOKEN && make lint
unset GITLAB_TOKEN && make qualify
  • make lint passed locally
  • make qualify failed in this workspace due an unrelated Go toolchain mismatch: compile: version "go1.26.1" does not match go tool version "go1.25.6"

CI-only change — no Go code modified.

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes: If the gate causes false negatives (GPU tests not running when they should), the run-gpu-tests label bypass still works as a safety valve. Revert is a single commit.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner April 13, 2026 18:48
@yuanchen8911 yuanchen8911 force-pushed the fix/gpu-workflow-path-gate branch from a9aa368 to 80866c4 Compare April 13, 2026 18:54
@github-actions github-actions bot added size/L and removed size/M labels Apr 13, 2026
@yuanchen8911 yuanchen8911 changed the title fix(ci): gate GPU workflows on actual PR diff against main fix(ci): replace push path filters with runtime path gate in GPU workflows Apr 13, 2026
@yuanchen8911 yuanchen8911 force-pushed the fix/gpu-workflow-path-gate branch from 80866c4 to 7b4b089 Compare April 13, 2026 21:29
@yuanchen8911 yuanchen8911 requested review from dims and lockwobr April 13, 2026 23:02
@yuanchen8911 yuanchen8911 force-pushed the fix/gpu-workflow-path-gate branch from 7b4b089 to 43ce3e5 Compare April 13, 2026 23:19
…flows

GPU workflows use push path filters on pull-request/* branches created
by copy-pr-bot. These filters compare against the previous push SHA, not
the merge base, causing two failure modes:

1. False positives: rebasing onto main includes merged changes in the
   push diff, triggering GPU tests for unrelated PRs.
2. False negatives: a subsequent push touching only unrelated files
   prevents the workflow from starting, even when the PR still contains
   GPU-relevant changes.

Replace the push-level path filters with a runtime check-paths gate job
using dorny/paths-filter with base: main. This compares the actual PR
diff against main, making check-paths the single source of truth for
path-based GPU test gating. The push trigger now fires on any push to
pull-request/* branches, and the check-paths job determines whether the
expensive GPU job should run.

Schedule, workflow_dispatch, and labeled pull_request events bypass the
gate and run unconditionally.
@mchmarny mchmarny enabled auto-merge (squash) April 14, 2026 15:33
@mchmarny mchmarny merged commit d988b02 into NVIDIA:main Apr 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants