Skip to content

Conversation

@KawtharShafie
Copy link

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: "Internal", or link to GitHub issue (if applicable).

What were the changes?
New implementation of a direct send/recv reduce scatter.

Why were the changes made?
Explain the motivation behind the work. Provide any publicly-available historical context.

How was the outcome achieved?
Technical details behind the work. Explain any publicly-available hardware peculiarities.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

Reset variable enabling direct RS across runs.

Update size of tempBuff and limit of direct RS.
@KawtharShafie KawtharShafie force-pushed the direct-reduce-scatter branch from cff6c01 to 75547ab Compare January 3, 2026 00:59
…e scatter and adjust offset into buffer to utilize multiple channels.
…number of channels leaving elements unaccount for in reduction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants