Skip to content

CI: Test sdist completeness for all user-facing packages on Linux and Windows#1909

Merged
cpcloud merged 6 commits intoNVIDIA:mainfrom
cpcloud:ci-test-sdist-1599
Apr 21, 2026
Merged

CI: Test sdist completeness for all user-facing packages on Linux and Windows#1909
cpcloud merged 6 commits intoNVIDIA:mainfrom
cpcloud:ci-test-sdist-1599

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

Adds a CI job that builds an sdist then a wheel-from-sdist for each of the
4 user-facing packages (cuda_pathfinder, cuda_python, cuda_bindings,
cuda_core) on both Linux and Windows. This catches MANIFEST.in /
package-data drift regressions (as previously slipped through in #1588) and
now also catches platform-gated source drift.

Why both platforms

cuda_bindings/build_hooks.py picks different *.pyx variants at build
time depending on sys.platform (*_linux.pyx vs *_windows.pyx), so a
Linux-only sdist completeness test cannot prove that the sdist ships the
Windows sources required to build a wheel-from-sdist on Windows.

Running the same build chain on Windows in CI closes that gap for
cuda_bindings (which also forces cuda_core to exercise the same
wheel-from-sdist code path, since cuda_core's wheel build requires a
cuda_bindings wheel).

Layout

  • .github/workflows/test-sdist-linux.yml: Linux sdist job (self-hosted
    runner, sccache + proxy cache, same CTK fetch as build-wheel.yml).
  • .github/workflows/test-sdist-windows.yml: Windows sibling
    (windows-2022 + ilammy/msvc-dev-cmd, no sccache or proxy cache, same
    CTK fetch on win-64).
  • ci.yml wires both as test-sdist-linux / test-sdist-windows, gated by
    should-skip and doc-only, and both are included in the checks
    aggregator so either failing blocks merge.

Scope

Pure-Python packages (cuda_pathfinder, cuda_python) don't depend on
platform-gated sources, so the Windows job is primarily for cuda_bindings
and cuda_core. The pure-Python builds are included on Windows too because
they are essentially free and keep the two workflows symmetric.

Closes #1599.

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

@cpcloud cpcloud force-pushed the ci-test-sdist-1599 branch from e65f1ed to 113573e Compare April 14, 2026 22:15
@cpcloud cpcloud requested a review from rwgk April 17, 2026 14:25
cpcloud and others added 4 commits April 17, 2026 11:54
Add a new test-sdist workflow that builds an sdist and then a
wheel-from-sdist for each of the 4 user-facing packages
(cuda_pathfinder, cuda_python, cuda_bindings, cuda_core). This
catches regressions in MANIFEST.in or package-data configuration
that could silently break sdist-based builds.

Closes NVIDIA#1599

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match existing test jobs' doc-only guard so docs-only PRs don't
run source-build validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cuda_bindings build backend imports cuda.pathfinder, so pip's build
isolation needs to find the locally-built cuda_pathfinder wheel
instead of pulling from PyPI. Set PIP_FIND_LINKS to the pathfinder
dist directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
actionlint rejects non-GitHub-hosted runner labels unless declared
in .github/actionlint.yaml. This was causing pre-commit.ci failures
on test-sdist.yml which uses linux-amd64-cpu8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the ci-test-sdist-1599 branch from 7916cf3 to 47bccd8 Compare April 17, 2026 15:54
@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Apr 17, 2026

Posting this separately as an optional suggestion. LGTM will be in another comment.

Outcome of a few rounds of prompt/response using Cursor GPT-5.4 Extra High Fast


Analysis

I think the current test-sdist.yml setup is probably fine in practice today, because the branch-local setuptools-scm versions are ahead of the currently published compatible releases.

However, PIP_FIND_LINKS alone does not fully encode the intended guarantee. It tells pip where local candidates live, but it does not require build isolation to use those exact local internal wheels. So the job is still somewhat fragile:

  • it depends on the current version ordering
  • it depends on what is or is not already published
  • it could become ambiguous again on a rerun, a backport, or once a newer compatible release exists on PyPI

So I would frame this as a robustness / explicitness issue more than an immediate correctness bug.

The nice part is that I think removing that uncertainty is fairly easy.

Suggestions

1. Best balance: keep build isolation, add exact build constraints

This seems like the best option to me.

Keep the existing pattern:

  • build local internal wheels first
  • use PIP_FIND_LINKS to point pip at those local wheel directories

But add exact --build-constraint files for the pip wheel ... <sdist.tar.gz> steps.

For cuda.bindings, constrain the local cuda-pathfinder wheel version exactly.

For cuda.core, constrain both:

  • the local cuda-bindings wheel version
  • the local cuda-pathfinder wheel version

Why this is attractive:

  • it preserves build isolation
  • it keeps the test meaningfully close to real PEP 517 behavior
  • it removes the resolver ambiguity cleanly
  • the local wheels have unique +g<sha> versions, so exact constraints should force the branch-local artifacts in practice

2. Stronger but heavier: use --no-index with a fully local wheelhouse

This would remove even more uncertainty:

  • local internal wheels come only from the wheelhouse
  • third-party build requirements also come only from the wheelhouse

But this is more work, because the wheelhouse would need to contain everything needed for build isolation:

  • setuptools
  • setuptools-scm
  • build
  • Cython
  • pyclibrary
  • and so on

This is probably overkill unless there is a broader goal of making the job network-independent or completely hermetic.

3. Simplest but weaker: disable build isolation

Another option is:

  • install the local internal wheels into the job environment first
  • then build with --no-build-isolation

This would be easy and deterministic, but it weakens the test. The point of this job seems to be checking sdist/build-backend correctness under something close to normal isolated builds, so I think this is the least attractive option.

Suggested implementation direction

If the goal is just to remove uncertainty without broadening the PR much, I would suggest option 1.

Concretely:

  • after building cuda_pathfinder/dist/*.whl, derive its exact version from the wheel filename
  • write a small constraints file for the cuda.bindings wheel-from-sdist step
  • after building cuda_bindings/dist/*.whl, do the same for cuda.core
  • keep the existing PIP_FIND_LINKS
  • add --build-constraint <file> to the pip wheel commands

I would focus this only on the pip wheel ... <sdist> steps. That is the part where the cross-package build dependency resolution matters. The python -m build --sdist ... steps are less important for this specific ambiguity.

Small example shape

For cuda.bindings, something conceptually like:

export PIP_FIND_LINKS="$(pwd)/cuda_pathfinder/dist"
python write_constraints.py cuda_pathfinder/dist > cuda_bindings/build-constraints.txt
pip wheel \
  --no-deps \
  --build-constraint cuda_bindings/build-constraints.txt \
  --wheel-dir cuda_bindings/dist \
  cuda_bindings/dist/*.tar.gz

For cuda.core, same idea, but constrain both local internal deps.

Agent hint

If the author wants to hand this to an agent, I would give it these constraints:

  • preserve build isolation
  • do not switch to --no-build-isolation
  • keep PIP_FIND_LINKS
  • derive exact versions from the just-built wheel filenames, rather than hardcoding anything
  • prefer a tiny inline Python snippet using packaging.utils.parse_wheel_filename(...) to extract versions robustly
  • constrain only the wheel-from-sdist steps unless there is a clear reason to widen the scope

That should keep the change small, explicit, and easy to review.

Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

One small caveat: the new test-sdist job only runs on Linux, so it does not fully prove platform-independent sdist completeness for packages that have platform-specific sources. For example, cuda_bindings/build_hooks.py selects different *_linux.pyx vs *_windows.pyx files, so a missing or stale Windows-only source could still slip through. I think this is still a useful test as-is; I would probably either add a small Windows follow-up job (at least for cuda_bindings, and likely cuda_core too), or keep this PR as-is but make it clear in the PR title and description that this is a Linux-side sdist smoke test rather than a full cross-platform proof.

Split test-sdist.yml into test-sdist-linux.yml (renamed) and a new
test-sdist-windows.yml sibling, mirroring the test-wheel-linux/-windows
pattern. cuda_bindings/build_hooks.py selects platform-specific *.pyx
variants (_linux.pyx on Linux, _windows.pyx on Windows) at build time,
so a Linux-only sdist test cannot prove the sdist ships the Windows
sources required to build a wheel-from-sdist on Windows.

The Windows variant uses the repo's windows-2022 + ilammy/msvc-dev-cmd
setup already used by build-wheel.yml, omits sccache and
nv-gha-runners/setup-proxy-cache (both limited to Linux in build-wheel
by convention), and otherwise mirrors the Linux steps.

ci.yml now invokes both workflows (test-sdist-linux, test-sdist-windows)
and the checks aggregator fails if either is cancelled or failed.

Addresses review feedback from PR NVIDIA#1909.
@cpcloud cpcloud changed the title CI: Test sdist builds for all user-facing packages CI: Test sdist completeness for all user-facing packages on Linux and Windows Apr 21, 2026
Convert the find-links paths to native Windows style with cygpath -w so
they match the convention used in build-wheel.yml's Windows CIBW
environment. This avoids ambiguity when pip splits PIP_FIND_LINKS on
spaces and is asked to resolve a mix of path styles.
@cpcloud cpcloud enabled auto-merge (squash) April 21, 2026 13:59
@cpcloud cpcloud merged commit b3ae845 into NVIDIA:main Apr 21, 2026
93 checks passed
@cpcloud cpcloud deleted the ci-test-sdist-1599 branch April 21, 2026 14:01
github-actions Bot pushed a commit that referenced this pull request Apr 22, 2026
Removed preview folders for the following PRs:
- PR #1909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Test sdist builds

2 participants