Drop deprecated CUB macros#3821
Merged
bernhardmgruber merged 2 commits intoNVIDIA:mainfrom Mar 4, 2025
Merged
Conversation
5c959be to
fa676c1
Compare
86 tasks
fa676c1 to
33e712d
Compare
miscco
reviewed
Feb 24, 2025
Comment on lines
+46
to
+47
| #include <limits> | ||
|
|
Contributor
There was a problem hiding this comment.
Should we move towards ::cuda::std::numeric_limits?
Contributor
Author
There was a problem hiding this comment.
I thought so too, but in the example it wasn't necessary, and maybe <limits> is more familiar. I really don't mind.
33e712d to
7608f32
Compare
2 tasks
7608f32 to
8881a1d
Compare
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
🟨 CI finished in 1h 51m: Pass: 94%/93 | Total: 2d 00h | Avg: 30m 59s | Max: 1h 14m | Hits: 84%/127854
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
8e1a5bb to
9bf7bbc
Compare
1 task
9bf7bbc to
a1ef5b8
Compare
fbusato
reviewed
Feb 25, 2025
| // Implement part of MemBoundScaling | ||
| items_per_thread = CUB_MAX(1, CUB_MIN(items_per_thread * 4 / accumulator_type.size, items_per_thread * 2)); | ||
| block_size = CUB_MIN(block_size, (((1024 * 48) / (accumulator_type.size * items_per_thread)) + 31) / 32 * 32); | ||
| items_per_thread = cuda::std::clamp(items_per_thread * 4 / accumulator_type.size, 1, items_per_thread * 2); |
Contributor
There was a problem hiding this comment.
I think we need to be consistent with the namespace usage _CUDA_VSTD vs. cuda::std::. Also, the global namespace is missing ::cuda...
Contributor
Author
b635217 to
d01fe85
Compare
d01fe85 to
5d5215b
Compare
db00811 to
2dda2c3
Compare
2dda2c3 to
b60fe52
Compare
elstehle
approved these changes
Mar 4, 2025
Contributor
🟨 CI finished in 6h 04m: Pass: 98%/93 | Total: 3d 02h | Avg: 48m 10s | Max: 5h 59m | Hits: 41%/133724
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
Contributor
🟩 CI finished in 6h 26m: Pass: 100%/93 | Total: 2d 20h | Avg: 44m 26s | Max: 1h 24m | Hits: 41%/133878
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 93)
| # | Runner |
|---|---|
| 66 | linux-amd64-cpu16 |
| 9 | windows-amd64-cpu16 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 4 | linux-arm64-cpu16 |
| 3 | linux-amd64-gpu-h100-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| 2 | linux-amd64-gpu-rtx2080-latest-1 |
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Apr 7, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Split out of #3748, since it causes SASS changes in at least
cub.bench.copy.memcpy.base. Several split-off PRs proposed sub-parts, accompanied by SASS diffs and benchmarks. The remainder of this PR is now only the removal of the unused macros (and the addition of a few missing includes).