Skip to content

Avoid device sync in training loss accumulation#44123

Open
cyyever wants to merge 3 commits intohuggingface:mainfrom
cyyever:cuda_sync3
Open

Avoid device sync in training loss accumulation#44123
cyyever wants to merge 3 commits intohuggingface:mainfrom
cyyever:cuda_sync3

Conversation

@cyyever
Copy link
Copy Markdown
Contributor

@cyyever cyyever commented Feb 18, 2026

What does this PR do?

This PR avoids device sync in training loss accumulation by torch.where. The is_torch_xla_available condition is also removed.

@Rocketknight1
Copy link
Copy Markdown
Member

Hey @cyyever! We did some internal discussion about CUDA sync PRs - would you be able to make a benchmark or profile graph showing the reduction in syncs? We might make those mandatory in future just because it can sometimes be hard to tell from reading the code where exactly syncs would occur

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants