fix(ci): move GPU concurrency to test jobs#581
Merged
yuanchen8911 merged 1 commit intoNVIDIA:mainfrom Apr 15, 2026
Merged
Conversation
dims
approved these changes
Apr 15, 2026
a80c90f to
788d567
Compare
lockwobr
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move GPU workflow concurrency from the workflow level down to the heavyweight GPU job so the lightweight
check-pathsgate can always start and surface a visible check on PR updates.Motivation / Context
GPU workflows on
pull-request/*branches currently apply concurrency at the workflow level. When an older GPU run is still in progress, a newer workflow run can remainpendingwithjobs: [], which means the PR checks UI may not show the new training or inference run at all until the older run clears.Moving the same concurrency group to the actual GPU job preserves the “latest run wins” behavior for expensive runner usage while still allowing
check-pathsto execute immediately and materialize a visible check.Fixes: N/A
Related: #558
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/)Implementation Notes
concurrencyfromgpu-smoke-test.yaml,gpu-h100-inference-test.yaml, andgpu-h100-training-test.yaml.concurrency.groupandcancel-in-progress: truesettings to the heavyweight GPU test job in each workflow.${{ github.workflow }}-${{ github.event_name }}-${{ github.ref }}), so manualworkflow_dispatchruns remain isolated from push-driven runs.check-pathsgate unconstrained so it can start immediately, compute the PR diff, and surface a visible status even while an older GPU run is still tearing down.Testing
git diff --checkpassed.yamllintpassed on the three changed workflow files.make lintcould not complete cleanly in this sandbox because a prior localgolangci-lintprocess left the global parallel lock in place (parallel golangci-lint is running).Risk Assessment
Rollout notes: This only changes when concurrency is applied, not the concurrency key itself. The expensive GPU jobs still cancel older same-branch runs; the
check-pathsgate is what becomes immediately runnable and visible.Checklist
make testwith-race)make lint)git commit -S) — GPG signing info