Avoid device sync in training loss accumulation by cyyever · Pull Request #44123 · huggingface/transformers

cyyever · 2026-02-18T08:22:57Z

What does this PR do?

This PR avoids device sync in training loss accumulation by torch.where. The is_torch_xla_available condition is also removed.

Rocketknight1 · 2026-02-18T14:14:07Z

Hey @cyyever! We did some internal discussion about CUDA sync PRs - would you be able to make a benchmark or profile graph showing the reduction in syncs? We might make those mandatory in future just because it can sometimes be hard to tell from reading the code where exactly syncs would occur

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

cyyever added 2 commits February 20, 2026 12:43

Avoid device sync in training loss accumulation

2faf44b

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Avoid device sync in gradient clipping

4d2264b

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

cyyever force-pushed the cuda_sync3 branch from 600ab36 to 4d2264b Compare February 20, 2026 04:43

More fixes

9b3afd4

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid device sync in training loss accumulation#44123

Avoid device sync in training loss accumulation#44123
cyyever wants to merge 3 commits intohuggingface:mainfrom
cyyever:cuda_sync3

cyyever commented Feb 18, 2026

Uh oh!

Rocketknight1 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyyever commented Feb 18, 2026

What does this PR do?

Uh oh!

Rocketknight1 commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants