Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Jan 12, 2026

Summary

  • Replace exhaustive O(n²) pairwise seed testing with prime-stride sampling
  • Reduces iterations from ~32,640 to ~100-150 while maintaining meaningful coverage
  • Removes the Windows skip marker added in Add skipif IS_WINDOWS for test_patterngen_seeds #1456, re-enabling the test on Windows

Closes #1455

@Andy-Jost Andy-Jost added bug Something isn't working cuda.core Everything related to the cuda.core module labels Jan 12, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 12, 2026
@Andy-Jost Andy-Jost requested a review from rwgk January 12, 2026 18:36
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test c48274e

log("done")


@pytest.mark.skipif(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could leave this in if we want to continue skipping this test on Windows. It is not important to test this on every platform IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 100% for removing this.

@Andy-Jost
Copy link
Contributor Author

The test runs in 0.19s on Linux.

@github-actions

This comment has been minimized.

log("done")


@pytest.mark.skipif(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 100% for removing this.

# especially on Windows. See https://github.com/NVIDIA/cuda-python/issues/1455
pgen = PatternGen(device, NBYTES)
for i in range(256):
for i in range(0, 256, 17):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't covering the usual trouble-maker corner case 1 anymore.

This would sample around the usual suspects (0, 1 corner case, then a couple around powers of 2):

(0, 1, 2, 3, 4, 5, 31, 32, 33, 127, 128, 129)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I included values <5 since those are the most common. I don't think there's anything special about powers of two in this case.

Replace exhaustive O(n²) pairwise seed testing with prime-stride
sampling, reducing iterations from ~32k to ~100 while maintaining
meaningful coverage.

Closes NVIDIA#1455
@Andy-Jost Andy-Jost force-pushed the fix-slow-patterngen-test branch from c48274e to d39f754 Compare January 12, 2026 19:09
@Andy-Jost
Copy link
Contributor Author

/ok to test d39f754

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks nice!

>>> n = 0
>>> for i in (ii for ii in range(0, 256) if ii < 5 or ii % 17 == 0):
...     js = tuple(jj for jj in range(i + 1, 256) if jj < 5 or jj % 19 == 0)
...     print(i, js)
...     n += len(js)
...
0 (1, 2, 3, 4, 19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
1 (2, 3, 4, 19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
2 (3, 4, 19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
3 (4, 19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
4 (19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
17 (19, 38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
34 (38, 57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
51 (57, 76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
68 (76, 95, 114, 133, 152, 171, 190, 209, 228, 247)
85 (95, 114, 133, 152, 171, 190, 209, 228, 247)
102 (114, 133, 152, 171, 190, 209, 228, 247)
119 (133, 152, 171, 190, 209, 228, 247)
136 (152, 171, 190, 209, 228, 247)
153 (171, 190, 209, 228, 247)
170 (171, 190, 209, 228, 247)
187 (190, 209, 228, 247)
204 (209, 228, 247)
221 (228, 247)
238 (247,)
255 ()
>>> print(n)
171

@Andy-Jost
Copy link
Contributor Author

/ok to test a4dcee1

@Andy-Jost
Copy link
Contributor Author

/ok to test d0454f2

@Andy-Jost Andy-Jost force-pushed the fix-slow-patterngen-test branch from d0454f2 to f428279 Compare January 13, 2026 19:58
@Andy-Jost Andy-Jost enabled auto-merge (squash) January 13, 2026 19:58
@Andy-Jost
Copy link
Contributor Author

/ok to test f428279

@Andy-Jost Andy-Jost merged commit 8f9f7b5 into NVIDIA:main Jan 13, 2026
80 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@Andy-Jost Andy-Jost deleted the fix-slow-patterngen-test branch January 14, 2026 00:01
@leofang leofang added this to the cuda.core beta 12 milestone Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Extremely slow cuda_core/tests/test_helpers.py::test_patterngen_seeds on Windows

3 participants