Eliminate multi-tensor LAMB reduction in favor of applying reduce_square_sum each tensor by suffiank · Pull Request #6023 · microsoft/onnxruntime

suffiank · 2020-12-03T16:15:56Z

Description: Eliminate multi-tensor LAMB reduction in favor of invoking reduce_square_sum individually all tensors. This simplifies the existing code but comes at a 1% perf reduction for BERT-L seqlen 128 bs 64 gradacc 1. It will be less for gradacc > 1, as in a more realistic training scenario.

Motivation and Context

Why is this change required? What problem does it solve?
This simplifies the existing code and makes it deterministic. It is also possible to make the existing multi-tensor LAMB reduction kernel deterministic by ordering the reduction across thread blocks. This does not reduce perf.

…lambdeterminism sync with master

…lambdeterminism

wschin

Let's merge another PR which nicely fixes the randomness of Reduce in Lamb.

suffiank · 2020-12-04T21:57:13Z

Let's merge another PR which nicely fixes the randomness of Reduce in Lamb.

Agreed.

suffian khan added 8 commits October 29, 2020 21:15

define ordering of reduction across blocks

647ac2f

Merge branch 'master' of github.com:microsoft/onnxruntime into sukha/…

e40ea72

…lambdeterminism sync with master

Merge branch 'master' of github.com:microsoft/onnxruntime into sukha/…

b18f373

…lambdeterminism

save state

8ec1220

drop multi-tensor lamb reduction in favor of reduce_square_sum

31302b8

remove debug code

b4cc431

remove debug

29d71c3

remove more code

6daf4f2

suffiank requested a review from a team as a code owner December 3, 2020 16:15

suffiank requested review from SherlockNoMad and wschin December 3, 2020 16:17

suffiank mentioned this pull request Dec 3, 2020

Fix multi-tensor LAMB reduction to be deterministic #6028

Merged

wschin suggested changes Dec 4, 2020

View reviewed changes

suffiank closed this Dec 4, 2020

suffiank deleted the sukha/simplifylambreduce branch February 19, 2021 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate multi-tensor LAMB reduction in favor of applying reduce_square_sum each tensor#6023

Eliminate multi-tensor LAMB reduction in favor of applying reduce_square_sum each tensor#6023
suffiank wants to merge 8 commits intomasterfrom
sukha/simplifylambreduce

suffiank commented Dec 3, 2020

Uh oh!

wschin left a comment

Uh oh!

suffiank commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suffiank commented Dec 3, 2020

Uh oh!

wschin left a comment

Choose a reason for hiding this comment

Uh oh!

suffiank commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants