Skip to content

Conversation

@doru1004
Copy link
Contributor

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
One sentence describing the work done.

Why were the changes made?
Explain the motivation behind the work. Provide any publicly-available historical context.

How was the outcome achieved?
Technical details behind the work. Explain any publicly-available hardware peculiarities.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

@doru1004 doru1004 force-pushed the indirect-functions-with-link-speedup-and-optional-generic branch 2 times, most recently from 385a303 to 32443aa Compare December 20, 2025 00:24
@alex-breslow-amd
Copy link
Contributor

I think my main comment is that there are many changes in files that we have inherited from NCCL. If there is any way to put most of these changes in new files, I think that would help make future code syncs with NCCL simpler.

Improve link time by splitting over types

Create separate compilation units and commands for each type.

Split across types and unroll factor.
@doru1004 doru1004 force-pushed the indirect-functions-with-link-speedup-and-optional-generic branch from 32443aa to c56107c Compare January 6, 2026 19:29
#endif // BUILD_GENERIC_KERNELS

// Type-specific kernel declarations
__global__ void ncclDevKernel_i8_1(ncclDevKernelArgs4K NCCL_GRID_CONSTANT const args4K);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use ncclDevKernelArgsDefaultStorage instead of ncclDevKernelArgs4K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants