Fixing gelu_checkpointing memory issue by RezaYazdaniAminabadi · Pull Request #812 · deepspeedai/DeepSpeed

RezaYazdaniAminabadi · 2021-03-02T23:31:06Z

This PR solves the memory error when enabling gelu-checkpoint. I also checked this modification for the rest of memory optimization flags inside transformer layer, and they all passed the unit tests.

@owmohamm, can you please verify if this is working on your side?
Thanks

owmohamm · 2021-03-03T00:55:49Z

Tested out the changes locally and are working fine. With my configuration which is analogous to Bert-Large memory savings are as follows

Flags Set	Memory Used
None	30252MiB
gelu_checkpoint	27308MiB
normalize_invertible	27650MiB
normalize_invertible + gelu_checkpoint	23184MiB
normalize_invertible + gelu_checkpoint + attn_dropout_checkpoint	21686MiB

* fixing buffers in transformer kernel when gelu-checkpoint is enabled * fixing the test issue for other memory optimization flags * fixing a bug for when attn_dropout_checkpoint is enabled Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

Reza Yazdani added 2 commits March 2, 2021 23:03

fixing buffers in transformer kernel when gelu-checkpoint is enabled

455ec72

fixing the test issue for other memory optimization flags

9e5ca61

RezaYazdaniAminabadi requested review from ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, samyam and tjruwase as code owners March 2, 2021 23:31

fixing a bug for when attn_dropout_checkpoint is enabled

83acfad

eltonzheng approved these changes Mar 3, 2021

View reviewed changes

RezaYazdaniAminabadi merged commit 8295d7a into master Mar 3, 2021

mrwyattii deleted the transformer/fix-gelu-checkpoint branch July 7, 2023 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing gelu_checkpointing memory issue#812

Fixing gelu_checkpointing memory issue#812
RezaYazdaniAminabadi merged 3 commits intomasterfrom
transformer/fix-gelu-checkpoint

RezaYazdaniAminabadi commented Mar 2, 2021

Uh oh!

owmohamm commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RezaYazdaniAminabadi commented Mar 2, 2021

Uh oh!

owmohamm commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants